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ALGORITHMIC DETERMINATION OF FIiANKING DNA T£CH CFMT 
SEQUENCES THAT CONTROL THE EXPRESSION OF SETS ^^l^'^R l600/29nr 
OF GENES IN PROKARYOTIC, ARCHEA AND EUKARYOTIC ^ 
GENOMES 



Reference to Relat:ed Appl±ca1:lon 

The present application is the subject of Provisional 
Application Serial No. 60/208,650 filed June 2, 2000 
entitled ALGORITHMIC DETERMINATION OF CONNECTRONS FOR THE 
HIGH LEVEL REGULATION OF GENE EXPRESSION. 

Int:roduct:ion 

RNA introduced into a cell by a virus is now known to 
trigger a cellular defense mechanism known as post- 
transcriptional gene silencing (PTGS) . If the viral RNA 
sequence matches a sequence within the cell's genome the 
associated genes are turned off or silenced. This 
phenomenon is also called 'RNA interference' or RNAi . A 
single-stranded RNA can interact with another single- 
stranded RNA (known as anti sense RNA) . The single- s ingl e 
stranded RNA can also form a triple-stranded complex with 
double-stranded DNA. This triple-stranded complex is 
known as a Hoogsteen helix. This patent application 
shows how two specific adjacent RNA single- stranded 
stranded sequences (called CI and C2 - for Control Sequence 
1 and Control Sequence 2) interact with two distant 
double-stranded DNA sequences (called Tl and T2 - for 
Target Sequence 1 and Target Sequence 2) to form a 
tetradic relationship which is called a '^connectron" . 
The two distant DNA double-stranded sequences (Tl and T2 ) 



must be on the same chromosome in a genome and they must 
be between about Ikb and lOSkb of each other. The 
adjacent single-stranded RNA sequences {C1/C2) can be on 
the same or different chromosome as the Tl and T2 
sequences. The CI sequence is identical to the Tl 
sequence and the C2 sequence is identical to the T2 



sequence. The connectron acts to stabilize the double- 
stranded DNA by allowing 30nm chromatin particles to 
form. Genes that lie between the Tl and T2 sequences 
when wrapped up in 30nm chromatin particles are not open 
to promotion and expression. The connectron (i.e. the 
tetradic relationship between the T1-T2 sequences and 
C1/C2 sequences) provides a general explanation for PTGS . 
A connectron can implemented by RNA sequences, PNA 
(Peptide Nucleic Acid) sequences or by a zinc-finger DNA 
Binding Protein (DBP) specific to the Tl and T2 
sequences . 

Characteristically the adjacent C1/C2 sequences lie in 
the 3'UTR of a gene. The Tl and T2 sequences do not lie 
within the translated region of any gene. These 
sequences ^'surround" one or more genes. There are, 
however, Tl and T2 sequence pairs that surround one or 
more C1/C2 sequences that are not 3'UTR to any gene. 
These are called "geneless connectrons" . There may be 
promoter sequences that cause the transcription of these 
3 ' UTR sequences . 

A computer-based algorithm that is similar to the 
algorithm used in the US Patent 6,205,404 has been 
developed to determine the connectron structure of any 
genome. This algorithm determines the existence of all 
the connectrons in the genomic DNA. Connectrons exist in 
prokaryotes, archea, single-celled eukaryotes , multi - multi- 
celled eukaryotes, plants and higher animals. Connectron 
relationships exist between prokaryotes and their 
plasmids. The geneless connectrons provide a possible 
mechanism for forming a hierarchy of gene expression 



control that will produce an understanding of cell 
differentiation and tissue development. 



Each connectron is a unique tetrad of sequences. Each 
connectron changes the expression of the genes between 
5 the Tl and T2 sequences . The CI sequence (which is 
equivalent to the Tl sequence) and the C2 sequence (which 
is equivalent to the T2 sequence) are determined by the 
invention described in this patent application. In 
general, the tetrad of connectron sequences can be 

10 patented because the structure of matter is known and the 
function of specific gene expression modulation is also 
known. Gene expression modification can be produced by 
introducing antisense RNA or PNA to interact C1/C2 RNA 
sequences or zinc-finger DBFs to interact with the Tl and 

15 T2 sequences. Using connectrons it will be possible to 
modify cellular and tissue behavior in a very general 
manner . 

Examples will be given from different genomes to 
illustrate that the connectron is a perfectly general and 
20 universal concept. 



25 



Definitions 



Double stranded DNA - Watson and Crick showed in 1953 
that DNA naturally forms a double-stranded helix. A 
typical double stranded sequence is 



5 5' -TAGAGGAGTACCAC-3' 
3' -ATCTCCTCATGGTG-5' 

Hydrogen Bond - The force between a hydrogen atom and 
another heavier atom such as Oxygen (O) , Nitrogen (N) , 
10 Phosphorus (P) , or Sulfur (S) . 

Positive strand - The positive strand is normally 
represented 5' to 3' running left to right as in 

15 5' -TAGAGGAGTACCAC-3' 

Negative strand - The negative strand is normally 
represented 5' to 3' running right to left as in 

20 3' -ATCTCCTCATGGTG-5' 

Single stranded RNA - Either the positive or the negative 
strand of the double-stranded DNA can be transcribed by 
the polymerase. In RNA U replaces T. 

25 

RNA of positive strand sequence 5' -UAGAGGAGUACCAC-3 ' 
RNA of negative strand sequence 5' -GUGGUACUCCUCUA-3' 

Antisense RNA - The antisense strand of any RNA sequence 
30 is the compliment sequence 



RNA sequence 



5' -UAGAGGAGUACCAC-3' 



Antisense RNA sequence 



3 ' - AUCUCCUCAUGGUG- 5 ' 



Triple Strand Helix - The RNA sequence of a RNA/DNA 
triple-strand complex is the same as the positive strand 
5 of the DNA 

DNA positive strand 5 ' -TAGAGGAGTACCAC-3 ' 

DNA negative strand 3' -ATCTCCTCATGGTG-5' 

RNA strand 5' -UAGAGGAGUACCAC-3 ' 

10 

Promoter - Any region of DNA, that binds proteins which 
engage the polymerase transcription mechanism. 

TATA Box - A region near the 3' end of a promoter with 
15 the sequence TATA. 

mRNA - The RNA produced from the DNA by the polymerase as 
a result of transcription 

20 Start of transcription - The 3' end of a promoter where 
the polymerase mechanism begins to transcribe DNA into 
mRNA. 

Exon - Any region of mRNA which is used to code for 
25 proteins 

Intron - Any region of mRNA lying between two exons which 
is not used to code for proteins. The introns are edited 
out of the initial RNA transcript to form the mature 
30 mRNA . 
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3' UTR - The untranslated 3' end of an itiRNA is beyond the 
end of the last exon. A stop codon in the mRNA causes 
the ribosome to stop the translation of mRNA into 
protein . 

End of translation - The 3 ' end of the 3' -most exon . 

Translated region - Any collection of exons and introns . 

Gene - Any DNA region that codes for a protein. Introns 
do not occur in prokaryotic genes and they sometime fail 
to occur in eukaryotic genes. A typical model of a gene 
is 



1 < Promoter > | 

I <-TATA Box-> I 

I <-Beginning of Translation 

I <- Translated Region >t 

End of Translation-> I 

I <-Exon-> I <-Intron-> | <-Exon-> | <-Intron-> | <-Exon-> I <-3' UTR-> t 

+ strand 

- strand 

I < Gene > | 

Positive strand gene - Any gene in which the features run 
5' to 3' on the positive strand 



Negative strand gene - Any gene in which the features run 
5' to 3' on the negative strand 

CI sequence - Any positive or negative strand DNA 
sequence of 20 bases or more. 

The C2 sequence must occur in the same chromosome as the 
CI sequence. 



C2 sequence - Any positive or negative strand DNA 
sequence of 20 bases or more. 



The CI sequence must occur in the same chromosome as the 
C2 sequence . 



C1/C2 - Any positive or negative strand DNA sequence of 
5 40 or more bases such that the CI sequence is adjacent to 
the C2 sequence 

Tl sequence - Any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
10 chromosome as the T2 sequence. The Tl and T2 sequences 
must be between about Ikb and 105kb apart. 

T2 sequence - Any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
15 chromosome as the Tl sequence. The T2 and Tl sequences 
must be between about Ikb and 105kb apart. 

Last exon gap or Gap-Distance - The number of bases 
between the end of transcription and the beginning of the 
20 C1/C2 sequence. In prokaryotes and single-celled 

eukaryotes this gap can range from no bases to 500 bases. 
In multi-celled eukaryotes the gap can be as large as 
10,000 bases. 

25 Poly-adenylation signal - A number of Adenosine (A) bases 
are added to the mRNA at the end of the 3'UTR. 

Possible Connectron - Any set of Tl, T2 and C1/C2 
sequences such that the CI sequence is identical to the 
30 Tl sequence and the C2 sequence is identical to the T2 
sequence. The promoter of some gene causes the mRNA of 
the gene to be expressed. The mRNA is edited to 



eliminate the introns. The whole mRNA including the 
3'UTR can move about in the cell or the nucleus of the 
cell. The C1/C2 RNA that is part of the 3'UTR moves to 
the Tl and T2 DNA sequences. A triple-stranded complex 
5 of the DNA and the RNA forms such that the CI sequence 
forms hydrogen bonds with the Tl sequence and the C2 
sequence forms hydrogen bonds with the T2 sequence. 
Because the CI sequence is adjacent to the C2 sequence, 
the Tl sequence is brought physically close to the T2 
10 sequence. This produces a loop of between about Ikb and 
105kb in the DNA, Histone proteins reduce the length of 
the DNA by binding 200 bases. Histone/DNA complexes form 
six-fold symmetry chromatin assemblies. The diameter of 
the chromatin assemblies is approximately 30nm. 

15 

Real Connectron - Any Possible Connectron which is within 
the Gap-Distance of some gene 

Homologous connectron - The Tl sequence and the T2 
20 sequence are on the same chromosome as the C1/C2 sequence 

Heterologous connectron - The Tl sequence and the T2 
sequence are on a chromosome different from chromosome of 
the C1/C2 sequence 

25 

Permanent connectron - Any C1/C2 sequence, which is 3' 
UTR to some gene that is not surrounded by any Tl and T2 
sequence pairs 

30 Transient connectron - Any C1/C2 sequence, which is 3' 
UTR to some gene that is surrounded by one or more Tl and 
T2 sequence pairs 



Self-limiting connectron - Any C1/C2 sequence which is 
3'UTR to some gene that is surrounded by the Tl and T2 
sequences such that C1=T1 and C2=T2 

5 

Geneless connectron - Any C1/C2 sequence which is not 
3'UTR to some gene but is surrounded by some Tl and T2 . 
A promoter may lie 5' to the C1/C2 sequence. 

10 Bidirectionality of Connectron Excitation - A C1/C2 short 
loop on one strand selects a T1-T2 long loop pair on the 
same or the opposite strand. The C1/C2 short loop has a 
complementary Cl'/C2' sequence on the opposite strand. 
Similarly the T1-T2 long loop pair has a complementary 

15 long loop pair Tl'-T2'. Wherever a C1/C2, T1-T2 tetrad 
exists there is a complementary Cl'/C2', Tl'-T2' tetrad. 
The C1/C2 short loop can be transcribed as a 3'UTR to a 
gene on the same strand. The Cl'/C2' short loop which is 
on the strand opposite to the C1/C2 short loop can also 

20 can be transcribed as a 3'UTR to a gene on the same 
strand. There are four possible models of action 
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Tl T2 gene - C1/C2 
+ strand 

- strand 

5 

Tl T2 

+ Strand 

- strand 

10 C2/C1 - gene 











15 




T2' 


Tl' C2'/C1' - gene 
gene - Cl'/C2' 











20 T2' Tl' 



Of course, the short loops and the long loops do not have 
to be on the same chromosome. 

25 Hierarchy of connectron action - When a C1/C2 is 
expressed it forms a T1-T2 loop by forming a connectron. 
The C1/C2 sequence does not have to be on the same 
chromosome as the Tl and T2 sequences. This provides a 
way of causing interaction between chromosomes. When the 

30 T1-T2 loop forms, any genes in that loop region which had 
been expressing C1/C2 sequences in their 3'UTRs, now 
cease expressing the C1/C2 sequences. The connectrons 
formed by these C1/C2 sequences will cease to exist after 
some time thus opening up the genes inside the respective 

35 T1-T2 loops to expression. The hierarchy of connectron 
action is alternates between repression and expression. 
The connectron hierarchies can be of any depth. 



One-to-Many connectron action - One C1/C2 sequence can 
form connectrons in many different places on many 
different chromosomes. The only requirement is that 
C1=T1 and C2=T2 . This makes it possible for one 

5 expression event to control the expression of many genes 
on different chromosomes. 

Many-tO"0ne connectron action - Cl/C2s that come from 
many different places on many different chromosomes can 
10 form a connectron for a specific T1-T2 sequence pair. 
The only requirement is that C1=T1 and C2=T2 . This makes 
it possible for many different expression events to 
control the expression of one set of genes on a 
particular chromosome . 

15 

Many-to-Many connectron action - The arrangement of 
C1/C2S and Tl-T2s across chromosomes can form a complex 
web of gene expression control relationships. 

20 Percentage of the Genome Regulated by Connectrons - Since 
the connectrons for a sequenced genome can be calculated, 
the percentage of the genome that is open to connectron 
regulation can be known. 

25 Emergent Property - The network of connectrons in any 
genome emerges from a knowledge of the complete DNA 
sequence of the genome. Because both the C1/C2 sequences 
and the T1-T2 sequences can be any place in the genome, 
the whole genomic sequence must be known before all the 

30 connectrons can be determined. 



Paradigm Shift - For the past fifty years since the 
discovery by Watson and Crick of the double-helical 
nature of DNA, the reigning paradigm for scientific 
discovery has been the study of one gene and its effects 
5 on the behavior of a cell . The advent of genomic 
sequencing and this invention of connectrons that emerge 
from the whole genome will produce a shift in the way 
scientists view biological systems and the way they 
formulate and execute experiments. The many-to-many 

10 relationships between the connectrons means that there 
are many ways in which the expression of a set of genes 
can be modulated. The multiplicity of control pathways 
means produces a system stability that makes it possible 
for biological systems to be stable for long periods of 

15 evolutionary time. The thinking that goes into 

formulating scientific experiments will have to change to 
accommodate the changes in understanding that will be 
induced by the application and extension of this patent 
application . 

20 

Hierarchy of DNA Structuring - The DNA of a cell's genome 
is structured in a hierarchy of six levels. Figures 1, 2 
and 3 have been adapted from The Molecular Biology of the 
Cell by Alberts, Bray, Lewis, Raff, Roberts and Watson 

25 [third edition pages 354, 345 and 348] . As shown in 
figure 1, the double stranded DNA is level 1. The 
double-stranded DNA is wrapped around histone proteins to 
form a chromatin particle that is level 2 of the 
hierarchy. Level 2 is described as ^'beads-on-a-string" 

30 in figure 1. The chromatin particles are packed in a 
six-fold symmetry as shown in figure 2a and figure 2b, 
These six-fold assemblies have a diameter of 30 nm. Each 
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30 run assembly contains from 18 (i.e. 6 * 3) to 30 (i.e. 
6*5) chromatin particles. The 30 nm assemblies 
aggregate into large loops which range in length from 
5, 000 bases to 100, 000 bases of DNA. The size of these 
5 large loops as shown in figure 1 is approximately 300 nm. 
These large loops constitute level 4 of the structuring 
hierarchy. As shown in figure 1, level 5 of the DNA 
structuring hierarchy many large loops are condensed to 
form a structure which is approximately 700 nm in 
10 diameter. The complete chromosome that constitutes level 
6 of the hierarchy is composed of two very long sections 
of level 5 DNA. 



Model of Chromatin Structure - The level 4 structure of 
15 DNA as shown in figure 1 ranges in length from 5, 000 to 
105,00 0 bases of DNA. Figure 3 shows that proteins are 
thought to connect portions of the long loops formed by 
the 30 nm particles to form a chromosome axis. These 
condensed long loops are described as chromomeres in The 
20 Molecular Biology of the Cell. 



Prior Art 

The chromomere model of DNA structuring was presented by 
N. A Resnik, et al.[l] and is based on electron 
5 microscopic data* There are more recent papers studying 
a variety of genomes with electron microscopy but no 
equivalent study of chromomeres has been done on a fully 
sequenced genome . 

A recent News Feature in Nature by T. Gura [2] described 
10 the discovery of post-transcriptional gene silencing in 
which viral RNA interacts with the transcribed RNA of the 
cell to silence the expression of genes. This article 
describes experiments in C. elegans and D. megalomaster 
in which RNA that is complementary to mRNA introduced 
15 into a cell. This ^^antisense" RNA has the effect of 
turning off the expression of one or more genes. The 
introduced complementary RNA produces an ^^RNA 
interference" called RNAi . 

Thomas Werner and his colleagues at Genomatix in Munich, 
20 Germany have developed an approach to understanding what 
they call ^'Matrix Attachment Region" (MAR) . Figure 5 
shows their interpretation of the structure of DNA 
surrounding a gene. The following description of the 

MAR is copied from the Genomatrix web site 

25 

^'Matrix Attachment Regions (MARs) MARs are sequence 
regions that are responsible for the attachment of 
genomic DNA to the nuclear matrix or scaffold. 
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Transcription absolutely requires anchorage of genomic 
DNA to the nuclear matrix. 

Functional features of MARs : 

5 

Anchoring of regulatory elements like promoters and 
enhancers to the nuclear matrix. 

Ensuring long term activity of promoters and 
10 enhancers in chromatin. 

Insulation, rendering a functional domain 
insensitive to position effects. 

Genomatix is conducting a research project to define and 
15 detect MARs by computer-analysis." 
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Brief Description of the Objects of the Invention 



An object of the invention is to provide a method of 
identifying DNA sequences that control the expression of 
different collections of genes in a genome comprising, 
detecting selected DNA sequences adjacent to some genes 
excluding exons and introns. 

An object of the invention is to provide a method of 
identifying DNA sequences that control the expression of 
different collections of genes comprising, detecting, by 
computer, one or more pairs of non-adjacent DNA sequences 
to which are bound to two RNA sequences . 

An object of the invention is to provide a method of 
identifying DNA sequences that control the expression of 
different collections of genes in a genome comprising 
detecting changes in connectron behavior in the genome. 

An object of the invention is to provide a method of 
modifying the expression of different gene collections in 
a genome, comprising detecting changes in connectron 
behavior as a result of an exogenous stimulus. 

An object of the invention is to provide a method of 
detecting where and when new genes are being integrated 
into a host genome comprising detecting the connectrons 
in said host genome . 



An object of the invention is to provide a method of 
detecting the expression effect of different gene 
collections in a given body comprising detecting the back 
and forth flow of connectrons between the chromosomes 
thereof . 

An object of the invention is to provide a method of 
modifying a given body comprising modifying the 
connectron organization therein . 

An object of the invention is to provide a method of 
detecting connectron control and target sequences in a 
given genome comprising: 

determining the base composition of said genome, 
determining one or more sites of control sequence 
organization, and/ or 

determining one or more sites of target application. 

An object of the invention is to provide a method of 
determining the response of a cell in any tissue to 
changes in the cell's environment and/or genetic 
composition comprising providing a complete genomic DNA 
sequence for the organism and determining the effect of 
changes in connectrons due to application of a given 
exogenous stimulus to the gnome. 

An object of the invention is to provide a method of 
determining in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes, the tetradic 
relationship T1=C1 and T2=C2 where Tl and T2 are DNA 
sequences 20 or more bases in length, where the CI 



sequence is adjacent to the C2 sequence, where the Tl and 
T2 sequences are on the same chromosome, and where the 
C1/C2 sequences are on the same chromosome as Tl and T2 
or where the C1/C2 sequences are on a chromosome 
5 different from Tl and T2 , wherein: 



CI sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence, 

10 

C2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the CI sequence must 
occur in the same chromosome as the C2 sequence, 

15 C1/C2 - any positive or negative strand DNA sequence 

of 40 or more bases such that the CI sequence is 
adjacent to the C2 sequence. 



Tl sequence - any positive or negative strand DNA 
20 sequence of 20 bases or more that is on the same 

chromosome as the T2 sequence, the Tl and T2 
sequences must be between about Ikb and 105kb apart, 
and 



25 T2 sequence - any positive or negative strand DNA 

sequence of 20 bases or more that is on the same 
chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and 105kb apart. 



30 An object of the invention is to provide a method of 
determining in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes, the connectron 



relationship that permits many different C1/C2 short 
loops to control the existence of a T1-T2 long loop and 
wherein said C1/C2 short lops can be on the same 
chromosome or on different chromosomes from the T1-T2 
5 long loop, wherein: 

CI sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence, 

0 

C2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the CI sequence must 
occur in the same chromosome as the C2 sequence. 



15 C1/C2 - any positive or negative strand DNA sequence 

of 40 or more bases such that the CI sequence is 
adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA 
20 sequence of 20 bases or more that is on the same 

chromosome as the T2 sequence, the Tl and T2 
sequences must be between about Ikb and lOSkb apart, 
and 



25 T2 sequence - any positive or negative strand DNA 

sequence of 2 0 bases or more that is on the same 
chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and 105kb apart. 



30 An object of the invention is to provide a method of 
determining in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes, the connectron 
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relationship that permits one C1/C2 short loop to control 
the existence of many T1-T2 long loops, the C1/C2 short 
loop can be on the same chromosome or on different 
chromosomes from the T1-T2 long loops, wherein: 

CI sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence. 



10 C2 sequence - any positive or negative strand DNA 

sequence of 20 bases or more, the CI sequence must 
occur in the same chromosome as the C2 sequence, 

C1/C2 - any positive or negative strand DNA sequence 
15 of 40 or more bases such that the CI sequence is 

adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
20 chromosome as the T2 sequence, the Tl and T2 

sequences must be between about Ikb and 105kb apart, 
and 



T2 sequence - any positive or negative strand DNA 
25 sequence of 20 bases or more that is on the same 

chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and lOSkb apart. 



An object of the invention is to provide a method of 
30 determining in the connectron relationships between 
prokaryotes and their plasmids wherein said connectrons 
implement a control mechanism between the two genomes 



that makes it possible from them to form a symbiotic 
relationship, and in the case of D. radiodurans the 
relationship is not symmetric, and the D. radiodurans 
genome sends C1/C2 short loops to the MPl plasmid, 
5 wherein: 

CI sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence, 

0 

C2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the CI sequence must 
occur in the same chromosome as the C2 sequence. 



15 C1/C2 - any positive or negative strand DNA sequence 

of 40 or more bases such that the CI sequence is 
adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA 

20 sequence of 20 bases or more that is on the same 

chromosome as the T2 sequence, the Tl and T2 

sequences must be between about Ikb and 105kb apart, 
and 



25 T2 sequence - any positive or negative strand DNA 

sequence of 2 0 bases or more that is on the same 
chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and 105kb apart. 



30 An object of the invention is to provide a method of 
determining that connectron relationships that exist in 
plant and higher animals. 
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An object of the invention is to provide a method of 
determining in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes^ the connectron 
5 relationship that permits one C1/C2 short loop to control 
the existence of one or more T1-T2 long loops without 
being subject to any expression controls other than those 
of the gene to which the C1/C2 is 3'UTR, wherein: 

10 CI sequence - any positive or negative strand DNA 

sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence. 



C2 sequence - any positive or negative strand DNA 
15 sequence of 20 bases or more, the CI sequence must 

occur in the same chromosome as the C2 sequence. 



C1/C2 - any positive or negative strand DNA sequence 
of 540 or more bases such that the CI sequence is 
20 adjacent to the C2 sequence. 



Tl sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
chromosome as the T2 sequence, the Tl and T2 
25 sequences must be between about Ikb and 105kb apart, 

T2 sequence ~ any positive or negative strand DNA 
sequence of 2 0 bases or more that is on the same 
chromosome as the Tl sequence, the T2 or Tl 
30 sequences must be between about Ikb and 105kb apart, 

and 



3'UTR - untranslated 3' end of an mRNA is beyond the 
end of the last exon, a stop codon in the rtiRNA 
causes the ribosome to stop the translation of mRNA 
into protein. 

5 

An object of the invention is to provide a method of 
determining in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes, the connectron 
relationship that permits one C1/C2 short loop to control 
10 the existence of one or more T1-T2 long loops such that 
this C1/C2 short loop is itself subject to expression 
control by another T1-T2 long loop which surrounds it, 
wherein : 



15 CI sequence - any positive or negative strand DNA 

sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence, 

C2 sequence - any positive or negative strand DNA 
20 sequence of 20 bases or more, the CI sequence must 

occur in the same chromosome as the C2 sequence, 

C1/C2 - any positive or negative strand DNA sequence 
of 540 or more bases such that the CI sequence is 
25 adjacent to the C2 sequence. 



Tl sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
chromosome as the T2 sequence, the Tl and T2 
30 sequences must be between about Ikb and lOSkb apart, 

and 



T2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and 105kb apart. 

5 

An object of the invention is to provide a method of 
determining in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes, the connectron 
relationship that permits one C1/C2 short loop to control 
10 the existence of the T1-T2 long loop that surrounds it, 
wherein: 

CI sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the C2 sequence must 
15 occur in the same chromosome as the CI sequence, 

C2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the CI sequence must 
occur in the same chromosome as the C2 sequence, 

20 

C1/C2 - any positive or negative strand DNA sequence 
of 4 0 or more bases such that the CI sequence is 
adjacent to the C2 sequence, 

25 Tl sequence - any positive or negative strand DNA 

sequence of 2 0 bases or more that is on the same 
chromosome as the T2 sequence, the Tl and T2 
sequences must be between about Ikb and 105kb apart, 
and 

30 

T2 sequence - any positive or negative strand DNA 
sequence of 2 0 bases or more that is on the same 
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chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and lOSkb apart. 



An object of the invention is to provide a method of 
5 determining the connectron relationships that do not have 
any genes within the T1-T2 long loop, wherein: 

Tl sequence is any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
10 chromosome as the T2 sequence, and 



T2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
chromosome as the Tl sequence, and the T2 or Tl 
15 sequences must be between about Ikb and 105kb apart. 



An object of the invention is to provide a method of 
determining the geneless connectron relationship where 
one C1/C2 short loop controls the existence of many 
20 geneless T1-T2 long loops, wherein: 



CI sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the C2 sequence must 
occur in the same chromosome as the CI sequence, 

C2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more, the CI sequence must 
occur in the same chromosome as the C2 sequence. 



30 



C1/C2 - any positive or negative strand DNA sequence 
of 40 or more bases such that the CI sequence is 
adjacent to the C2 sequence. 



Tl sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
chromosome as the T2 sequence, the Tl and T2 
5 sequences must be between about Ikb and lOSkb apart, 

and 

T2 sequence - any positive or negative strand DNA 
sequence of 20 bases or more that is on the same 
10 chromosome as the Tl sequence, the T2 or Tl 

sequences must be between about Ikb and 105kb apart. 
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Description of the Drawings and Tables 

The above and other objects, advantages and features of 
the invention will become more apparent when considered 
5 with the following specification and accompanying 
drawings and tables wherein: 

Figure 1 DNA is structured in six levels of increasing 
condensation. Double stranded DNA is level 1. 

10 Two turns of DNA are wrapped about each 

chromatin particle at level 2. The chromatin 
particles which each containing 200 base pairs 
form into 30 nm particles at level 3. The 30 
nm particles form into large loops with an 

15 approximate dimension of 300 nm at level 4. 

Metaphase chromosomes form a condensed 
structure with an approximate dimension of 7 00 
nm at level 5. An entire metaphase chromosome 
has a width of approximately 1400 nm at level 

20 6. The large loops at level 4 of the DNA 

structuring are thought to have between 20,000 
(20 kb) and 100,000 (100 kb) base pairs. 

The Molecular Biology of the Cell by Alberts, 
25 Bray, Lewis, Raff, Roberts and Watson, 3rd. ed. 

, Garland Publishing, Inc., New York, 1994, p. 
354 



Figure 2 



(a) Chromatin DNA forms into a six-fold 
symmetry 30nm particles. 



(b) The six-fold symmetry 30nm particles form a 
linear chain with a varying number of repeat 
5 units. 



The Molecular Biology of the Cell by Alberts, 
Bray, Lewis, Raff, Roberts and Watson , 3rd. 
ed. , Garland Publishing, Inc., New York, 1994, 
10 p. 345 

Figure 3 Long loops of 30nm particles are thought to be 
closed at the bottom of the loop by proteins. 



The Molecular Biology of the Cell by Alberts, 
15 Bray, Lewis, Raff, Roberts and Watson, 3rd. ed. 

, Garland Publishing, Inc., New York, 1994, p. 
348 

Figure 4 (a) Transcription and Editing. (b) Movement of 
the RNA through the Nucleus. (c) Connectron 
20 Formation 

Figure 5 Overview of schematic organization of a typical 
transcriptionally active chromosomal loop. 

From — http://g e n o ma tix.g sf.de/func_genomics/ 
functionalg e nomi os.h tm l 
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Table 1 Connectron Properties for Prokaryotic, Archea 
and Eukaryotic Genomes 

Table 2 Yeast Inter-Chromosomal Connectron Distribution 



Figure 6 Genome size plotted as a log-log function of 
the Number of Connectrons 

Figure 7 Number of Sequence Instances plotted as a 
5 function of the Number of Fragments 

Figure 8 Level 0 - The overall view of the algorithm 

Figure 9 Level 1 - Process Flow of the Algorithm 

Figure 10 Level 2a - two pages - Process Genome into 
Blocking Fragment File 

10 Figure 11 Level 2b - twethree pages - Compute the 
Connectrons for a Genome 

Figure 12 Level 2c - two pages - Analyze Possible 
Connectrons 

Figure 13 Level 3a - Setup Genome Usage Memory 

15 Figure 14 Level 3b - Find DBP-Size Blocking File for Tl- 
Window 

Figure 15 Level 1 - Find DBP-Size Blocking File for T2- 
Window 

Figure 16 Level 2a - two pages - Find C1/C2 Entries 

20 Figure 17 Level 2b - two pages - Scan Genome Usage Memory 
for Potential Connectrons 



Description of the Invention 

A connectron is a relationship among four DNA sequences. 
Each sequence must be at least 20 bases long. There is a 
5 report by Sharp and Zamore [3] that RNA sequences of 
'^about length 25" are important as sources of RNAi . 27 
bases were actually used as the minimum length of each of 
the sequences. The Tl sequence is on one strand of some 
chromosome in a genome. The T2 sequence is on the same 

10 strand of the same chromosome as the Tl sequence. The Tl 
and T2 sequences (which are each at least 20 bases in 
length) must be at least 5,000 bases distant from each 
other but they can not be more than 105,000 bases distant 
from each other. The CI sequence and the C2 sequence 

15 (which are each at least 20 bases in length) are adjacent 
to each other on some strand of some chromosome in the 
genome. The C1/C2 sequences - called the "short loop" - 
can be on the same strand as the Tl and T2 sequences or 
they can be on the opposite strand. The C1/C2 sequences 

20 of the short loop can be on the same chromosome as the Tl 
and T2 sequences but they can also be on a different 
chromosome in the genome. When a genome has only one 
chromosome, then the point is moot. Many genomes, of 
course, have several chromosomes. The CI sequence is 

25 identical to the Tl sequence and the C2 sequence is 
identical to the T2 sequence. 

The C1/C2 sequence must be on the same strand as a gene, 
either be directly adjacent to the gene (i.e. a gap of 0 
30 bases) for prokaryotic genomes or at this time be within 
10,000 bases for eukaryotic genomes. The size of the gap 
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between the end of the gene and the beginning of the 
C1/C2 sequence is a variable. The C1/C2 short loop is 
expressed as the 3'UTR (Un-Translated Region) of the 
gene. In the case of prokaryotic genes that do not 
5 normally have introns, the whole mRNA becomes the active 
species for connectron formation. In the case of 

eukaryotic genes, the whole transcript is the active 
species for connectron formation upon editing of the 
transcript to eliminate the introns. The whole 

10 transcript then can move about in the cytoplasm of 
prokaryotic cells or the nucleus of eukaryotic cells. 
Since the CI sequence is equivalent to the Tl sequence 
and the C2 sequence is equivalent to the T2 sequence, the 
CI RNA can form a Hoogsteen triple-stranded RNA/DNA/DNA 

15 helix with the double-stranded Tl sequence. Similarly 
the C2 RNA can form a Hoogsteen triple-stranded 
RNA/DNA/DNA helix with the double-stranded T2 sequence. 
Because the CI sequence and the C2 sequence are adjacent 
to each other, the C1/T2 RNA/DNA/DNA Hoogsteen triple 

20 helix is brought into physical adjacency to the C2/T2 
RNA/DNA/DNA Hoogsteen triple helix. RNA/DNA/DNA hybrid 
helices are the most stable form of triple helix. RNA 
double helices, DNA double helices, RNA triple helices 
and DNA triple helices are all significantly less stable 

25 than a RNA/double-stranded DNA triple helix. The stable 
physical adjacency of the two triple-stranded Hoogsteen 
helices ensures that the long loop of double-stranded DNA 
between the Tl sequence and the T2 sequence can then be 
structured into 30 nm chromatin particles as shown in 

30 level 4 of figure 1. The genes on either strand of the 
DNA between the Tl sequence and the T2 sequence when they 
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are structured into the 30 nm chromatin particles are not 
open to promotion and expression. 

The tetradic relationship between the Tl and T2 sequences 
5 that form the long loop and the C1/C2 sequences that form 
the short loop are called a connectron. The name 
""^connectron" was suggested by J. David Rawn Ph.D. of 
Towson University. A connectron is possible if the Tl, 
T2, CI and C2 sequences exist. A connectron is real if 

10 the C1/C2 short loop sequence is adjacent to an 
expressible gene. If the expression of the adjacent gene 
is inside one or more Tl - T2 long loops then this 
connectron is said to be transient. If the adjacent gene 
is not inside any possible T1-T2 long loop then the 

15 connectron is said to be permanent. If a connectron is 
inside of a T1-T2 long loop that has the same sequences 
(i.e. Tl is really equal to CI and T2 is really equal to 
C2) then the connectron is said to be self-limiting. 
This is true because once the C1/C2 sequence is expressed 

20 it forms the T1-T2 long loop that then shuts off the 
expression of the gene adjacent to the C1/C2 sequence. 
Self-limiting conectrons can also be called '^spike" 
connectrons since they generate a short-duration spike of 
the C1/C2 short loop sequence. If a T1-T2 long loop does 

25 not contain any genes but it contains C1/C2 short loop 
sequences then this type of connectrons is said to be 
geneless. The C1/C2 short loops within a geneless T1-T2 
long loop can, of course, control the expression of 
genes . 

30 

The physical existence and lifetimes of the connectrons 
must be proved by molecular biological experimentation. 
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This physical experimental process, however, is logically 
quite separate from the computational experimentation 
that have been conducted from June of 1999 to May of 
2 001. The computational search for the existence of 
5 connectrons has been extremely positive. These 
computations have shown that connectrons exist in 
prokaryotes, in archea, between prokaryotes and their 
plasmids, in single-celled eukaryotes, in multi-celled 
eukaryotes, in plants, in higher animals and in humans. 
10 All of these features and properties are described in the 
claims section that follows. 

The connectron invention is very powerful. It depends 
only on sequence equivalency. The minimum length of the 

15 four sequences seems to be about 20 bases. In the 
calculations shown in this patent application, 27 bases 
have been used as a minimum. The Nature News Feature [1] 
says that other scientists have found RNA sequences of 
length about 25 that have interesting gene silencing 

20 properties. The Nature article does not give any 
mechanism. Because of my algorithm and its use on a 
variety of genomes, this patent application provides the 
computational proof that a particular mechanism is highly 
probable. The connectron invention provides an 

25 explanation for how communication occurs with a 
chromosome as well as between chromosomes in genomes that 
have more than one chromosome. Since each T1-T2 long 
loop can contain one or more genes, the connectron 
invention provides a mechanism for turning on and turning 

30 off sets of genes simultaneously. In time, the 

connectron invention will provide an explanation for how 
differentiation of how one cell's behavior differs from 
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the behavior of another adjacent cell. It is already 
clear from the computational experiments that have been 
made on S. cervesiae, C. elegans and D. megalomaster that 
the number of geneless connectrons increases dramatically 
5 as evolution proceeds from single-celled eukaryotes (i.e. 
S. cervesiae) to 1,000 cell eukaryotes (i.e. C. elegans) 
to visible creatures (i.e. D. megalomaster). The 
extension of this evolutionary progress to plants (i.e. 
A. thaliania) for which only three chromosomes are 

10 sequenced and humans (i.e. H. sapiens) for which only one 
chromosome is completely sequenced. Although the complete 
human genome was published in Nature and Science in 
February of 2001, the NIH-sponsored genomic sequencing 
results are available for about 1/3 of the bases in the 

15 whole genome. The human genomic sequence determined by 
Celera Genomics, Inc. is available only by subscription. 
Table 1 shows how the genome size, the number of genes, 
the number of gene-containing and geneless connectrons 
and the percentage of genes controlled are related in 

20 many different genomes. 

The C1/C2 short loops originate on one chromosome. The 
T1-T2 long loops can be on the same or different 
chromosomes. Table 2 which is for yeast (S. cervesiae) 

25 is a square matrix of how many C1/C2 short loops on a 
given chromosome are sent to form T1-T2 long loops on 
other chromosomes. The diagonal of this matrix shows 
that many chromosomes send connectrons to themselves. 
The striking feature of this particular table is that 

30 chromosome 6 only sends connectrons to chromosome 12 but 
that it receives connectrons from chromosomes 
4,5,7,10,12,13,15 and 16. 



Any tetrad of connectron sequences (i.e. the Tl, T2, CI 
and C2 sequences) as well as the fact of the adjacency of 
the C1/C2 short loop sequence to the transcribing gene 
5 can be patented because the content of matter and the 
utility can be exactly described. The utility of a 
connectron is that the T1-T2 long loop shuts off the 
expression of the genes that lie between the Tl sequence 
and the T2 sequence. In the case of geneless 

10 connectrons, the utility is of a higher level in that the 
C1/C2 short loops contained in the higher-level geneless 
T1-T2 long loop, eventually form other lower-level T1-T2 
long loops around a set of genes. 

15 The invention of connectrons comes at a particularly 
important time in biological discovery. The geneless 
connectrons make a many-to-many hierarchical control 
mechanism possible. It is already clear from the 
determination of the conectrons for C. elegans and D. 

20 megalomaster that there are as many or more geneless 
connectrons than there are genes. It has been clear for 
some time that the number of genes in a genome is not 
particularly correlated with the size of the genome. 
Figure 6 shows that the size of a genome is roughly 

25 linearly correlated with the number of connectrons. 

The connectron invention can be used to generate a model 

of behavior in any cell. The simulation of connectron 

behavior in different genomes will be the subject of 
30 another patent application. 
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The connectron invention provides for a rational 
exploitation of the information contained in the raw 
genomic DNA sequence by forming a hierarchy of 
relationships between geneless connectrons, transient 
5 connectrons, permanent connectrons, self-limiting 
connectrons and the expression of genes. 
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De'taxled Descrlp-tlon of t:he Inven-tlon 

The algorithm for the determination of connectrons in any 
genome or any genome fragment is represented in the 
5 following flow diagrams. The Level 0 diagram in figure 8 
shows the general relationships in a digital computer. 
The central processor of the digital computer uses the 
computer program to take genome descriptors, the genomic 
DNA sequences and the tables of gene features to produce 

10 a file of blocking fragments and a file of the optimal 
connectrons for the genome. The printer serves to make 
hard copies of the files and this patent application. 
The level 1 diagram in figure 9 shows the three essential 
steps in the determination of connectrons. The genome is 

15 first processed into a blocking fragment file. Then the 
blocking fragments are used to compute the connectrons 
for the genome. Finally the potential connectrons are 
analyzed to determine if the C1/C2 sequences are in the 
3'UTR of a gene. The level 2a diagram in figure 10 shows 

20 the steps required for the processing of the genome into 
a file of blocking fragments. The genomic DNA sequence 
is decomposed into 27-base frames for both the positive 
and negative strands. These fragments are written to the 
unsorted fragment file. The fragment file is then sorted 

25 is then read and formed into groups of equivalent 
sequences. The (.blk) file contains the sequence and a 
pointer to the (.gptr) file which contains the pointers 
to the position of the fragments in the genomes. The 
position in the genome includes the chromosome number, 

30 the position in the chromosome and the strand (i.e. 
positive and negative) . A sample of these files follows 
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Sample of the (.blk) file for S. cervesiae 



Number Pointer 
of instances to 



10 



15 



20 



25 



30 



35 



27-base fragment 
( . gptr ) file 

111111111111111111111111111 
1111111232 4 4233313332 44 3414 
111111141113443133314333341 
1111112 32 4 42 33 313332 4 434141 
11111132 331113332314 4 42 34 4 4 
111111332213331341414 4 43413 
11111133344 4112 34 3412 32 324 3 
11111133344 411334 3412 32 3243 
111111411134431333143333414 
11111144 322313414212443412 4 
1111122232 3434444444314 4442 
1111122 4 412 34 41122214 421213 
111112 3112 41114 344334134 431 
11111232 442 33313332 4 4341414 
111112 344 2 322 3134 42 42 2 34 342 
111112 4 33444244421144134211 
111112 444311313442332142224 
1111131312 41131114 424413231 
11111314333234 4 311113133411 
1111132331113332 31444234441 



In fragments above 1=G, 2=C, 3=A, 4=T 

Sample of the (.gptr) file for S. cervesiae 

There are 16 chromosomes in S. cervesiae 

Item Chromosome Position Direction 

in Chromosome 



0 


1 


1 


2 


2 


4 


1 


5 


2 


7 


2 


9 


1 


10 


9 


19 


2 


21 


2 


23 


2 


25 


8 


33 


2 


35 


1 


36 


1 


37 


1 


38 


1 


39 


1 


40 


1 


41 


2 


43 



40 



45 



1 


0 


0 


0 


2 


4 


11137 


1 


3 


12 


467619 


1 


4 


12 


458482 


1 


5 


4 


11138 


1 


6 


12 


465759 


2 


7 


12 


456622 


1 


8 


1 


219366 


1 


9 


8 


539978 


1 


10 


14 


522451 


1 


11 


4 


1099073 


1 





12 


4 


1210003 


1 




13 


7 


539068 


1 




14 


12 


654136 


1 




15 


12 


596455 


1 


5 


16 


15 


121016 


1 




17 


15 


598127 


2 




18 


16 


847724 


1 




19 


16 


59765 


1 




20 


12 


467620 


1 


10 


21 


12 


458483 


1 




22 


12 


461657 


1 




23 


12 


452520 


1 




24 


13 


838006 


1 




25 


15 


288270 


1 


15 


26 


4 


83593 


1 




27 


4 


992867 


1 




28 


6 


162265 


1 




29 


7 


845687 


1 




30 


10 


531560 


2 


20 


31 


15 


282208 


1 




32 


16 


860418 


1 




33 


16 


572308 


1 




34 


12 


465992 


1 




35 


12 


456855 


1 


25 


36 


4 


11139 


1 




J / 


o 
o 


p Q O /I 


1 
J- 




38 


4 


10302 


1 




39 


1 


19894 


2 




40 


16 


9311 


1 


30 


41 


10 


735203 


1 




42 


12 


465760 


1 




43 


12 


456623 


1 




In direction column 


above 


l=positi" 


35 


2=negative 


strand 







The level 2b diagram in figure 11 shows the computation 
of the connectrons. The genome descriptors consist of 

40 the number and length of the chromosomes. The algorithm 
uses an array that represents several facts about each 
base position in the genome. The level 3a diagram in 
figure 13 shows the setup of the Genome-Usage memory. 
The gene features are used to prevent the region of the 

45 genome that codes for proteins from being used for the 
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connectron sequences (i.e. the Tls, the T2s, the Cls and 
the C2s) . In the level 2a diagram of figure 10, the 
algorithm steps through each chromosome and within each 
chromosome through each base position looking for 
5 acceptable Tl-windows of 27 bases. A Tl-window can be 
used to form a connectron relationship if there are two 
or more instances of this fragment in the blocking 
fragment file. The computation in the level 3b diagram 
of figure 14 determines if the Tl-window is acceptable of 

10 not. Once an acceptable Tl-window is found, the 

algorithm (in the level 2a diagram of figure 10) looks 
for acceptable T2-window positions that lie between 5,000 
and 105,000 bases from the Tl-window. The computation 
for determining acceptable T2-window positions is done in 

15 the level 3c diagram of figure 15. Once a pair of Tl and 
T2 window positions are found, the algorithm looks among 
the instances of these Tl and T2 sequences for a pair of 
sequences CI and C2 that lie within 200 bases of each 
other on the same chromosome. The computation for 

20 determining acceptable C1/C2 windows is shown in the 
level 3d diagram in figure 16. In the level 3e diagram 
of figure 17 the Genome-Usage memory is scanned for the 
Possible-Connectrons . In the level 2c diagram of figure 
12 the Possible- Pof^sible -Connectrons are scanned to 

25 determine if the C1/C2 sequences are within the Gap- 
Distance of a gene on either the positive or the negative 
strands. The Real-Connectrons are then written out in 
several different files including the descriptions in the 
claims section. 

30 



Examples 

The algorithm for the determination of optimal 
connectrons has been applied to a number of different 
publicly available genomes. The connectron is a tetradic 
5 relationship between four sequence elements - Tl, T2, CI 
and C2 . The claims presented in this section are written 
by the program NearGene that implements the flow diagram 
Level 2c of figure 12 . The examples are written a 
uniform type of English. Each example contains some or 
10 all of the following elements 



Name of genome 

Description of Tl 
15 Length of T1-T2 loop 

The chromosome on which the T1-T2 loop exists 

The identifier number within the genome of the Tl 

sequence 

The Tl sequence 
20 Description of T2 

The identifier number within the genome of the T2 

sequence 

The T2 sequence 

A list of genes whose expression is controlled by 
25 the T1-T2 loop 

The common names of the genes as obtained from the 
NCBI gene feature file (.ptt) 

A list of C1/C2 short loops whose expression if 
controlled by the T1-T2 loop 
30 The chromosome on which the C1/C2 short loop exists 
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The common name of the gene which expresses the 

C1/C2 short loop as an RNA 

The sequence of the C1/C2 short loop 

A list of C1/C2 short loops that control the 
5 formation of the T1-T2 loop 

The chromosome on which the C1/C2 short loop exists 
The common name of the gene which expresses the 
C1/C2 short loop as an RNA 
The sequence of the C1/C2 short loop 
10 The match between the C1/C2 sequence and the Tl 

sequence 

The match between the C1/C2 sequence and the T2 
sequence 



15 The uniform descriptions make it possible to rapidly 
comprehend the specifics in each example. 

When a sequence element is very long a series of four 
dots has been inserted between the beginning and ending 
sequence groups. A variable number of bases have been 
20 deleted. 



Index of Pag e s for Connectron Samples 



Pag e 4 5 

Connectrons occur in prokaryotes, archea, single- 
celled eukaryotes and multi-celled eukaryotes. 

Pag e 66 

Many Connectrons control the expression of one set 
of genes in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes. 

1 tICi.C' ~ "T 

One connectron controls the expression of many sets 
of genes in prokaryotes, archea, single-celled 
eukaryotes and multi-celled eukaryotes. 

Pag e 12 0 

Connectrons occur between prokaryotes and their 
plasmids . 

Pag e 130 

Connectrons occur in plants and higher animals 
Pag e M O 

Permanent connectrons exist in prokaryotes, archea, 
single-celled eukaryotes and multi-celled 

eukaryotes , 

Pag e 150 

Transient connectrons exist in prokaryotes, archea, 
single-celled eukaryotes and multi-celled 

eukaryotes . 



Self -limiting connectrons occur in prokaryotes, 
archea, single-celled eukaryotes and multi-celled 
eukaryotes 

Pag e 182 

Geneless connectrons exist in single-celled and 
multi-celled eukaryotes 



Pag e 192 

One connectron controls many geneless connectrons 

in single-celled and multi-celled eukaryotes 



eukaryotes 



1 . Connectrons occur xn prokaryot:es , archea / 
single-celled eukaryot:es and mult:! -eel led 
eukaryotes . 

5 

Connectrons exist as tetradic relationships where the 
sequence Tl is equivalent to the sequence CI (written 
T1=C1) and where the sequence T2 equals the sequence C2 
(written T2=C2) where Tl and T2 are DNA sequences 20 or 

10 more bases in length, where the CI sequence is adjacent 
to the C2 sequence, where the Tl and T2 sequences are on 
the same chromosome, and where the C1/C2 sequences are on 
the same chromosome as Tl and T2 or where the C1/C2 
sequences are on a chromosome different from Tl and T2 . 

15 The connectron relationship has been found to exist in 
prokaryotes, archea, single-celled eukaryotes and multi- 
celled eukaryotes. 

Example of a prokaryote connectron - E, coli 

20 

In this example the existence of the T1-T2 (3197-3308) 
long loop is controlled by three C1/C2 short loops (3307, 
3432 and 2218) . The T1-T2 long loop controls the 
expression of 64 genes on chromosome 1 in addition to six 
25 C1/C2 (3204, 3206, 3223, 3228, 3301 and 3327) short 
loops. The C1/C2 short loop 3327 lies outside the range 
of the T1-T2 long loop (3197-3308) but this C1/C2 is 
expressed as a 3'UTR to the gene hemG that is within the 
range of the T1-T2 long loop. 
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3307 Chromosome 1 
34 32 Ch r omo s ome 1 
2218 Chr omo s ome 1 



5 



★ 



★ 



★ 



Chromosome 1 



3197 



3308 



10 



3204 
3224 
3301 



3206 
3228 
3327 



Connectron control elements for chromosome 1 of the E. 
15 coli genome 

A double stranded DNA loop of length 93.542 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3197. This Tl control element has 
20 the DNA sequence 

Seq . Id. - 1 Position - 1 to 175 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
25 CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 
GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCG 
GGAA 

This double stranded DNA loop is bounded on the right by 
30 a T2 control element whose identifier is 3308. This T2 
control element has the DNA sequence 

Seq. Id. = 2 Position = 1 to 175 



35 TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGG 
AACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGA 



AAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATGCACACCCCGCGC 
CGCT 

This long T1/T2 double stranded DNA loop modulates the 
5 expression of the following genes 





rrsC 


gltU 


rrlC 


rrf C 


aspT 




trpT 


yif A 


yifE 


yifB 


ilvL 




ilvG 1 


ilvM 


ilvE 


ilvD 


ilvA 


10 


ilvY 


ilvC 


ppiC 


b3776 


rep 




qppA 


rhlB 


trxA 


rhoL 


rho 




rf e 


wz zE 


wecB 


rffH 


wecD 




wecE 


wzxE 


yifM_2 


wecG 


yifK 




argX 


hisR 


leuT 


proM 


aslB 


1 s 




hemY 


hemX 


hemD 


cyaA 




cyaY 


b3808 


dapF 


uvrD 


b3814 




corA 


yigF 


yigG 


rarD 


yigl 




pldA 


recQ 


yigj 


yigK 


pldB 




yigL 


yigM 


metR 


metE 


ysgA 


20 


udp 


yigN 


ubiE 


yigP 


b3836 




yigU 


yigW 1 


rfaH 


yigC 


ubiB 




fadA 


fadB 


pepQ 


trkH 


hemG 




This long 


T1/T2 double 


stranded 


DNA loop modulates the 


25 


expression 


of the following C1/C2 


short loops 





A C1/C2 short loop on chromosome 1 whose identifier is 
3204 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
30 expressed as a RNA single strand that is 3'UTR to the 
gene rrsC and has the DNA sequence 



+ 



I 



Seq. Id. ^ 3 Position = 1 to 186 



GATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGAT 
CCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACT 
5 CCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCA 
TGCCGCGTGTATGAA 

A C1/C2 short loop on chromosome 1 whose identifier is 
3206 controls the expression of the genes of one or more 
10 other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3*UTR to the 
gene rrsC and has the DNA sequence 

Seq. Id. = 4 Position = 1 to 186 

15 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
AATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATAT 
CTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCA 
AGCT GAAAATT GAAA 

20 

A C1/C2 short loop on chromosome 1 whose identifier is 
3223 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
25 gene rrlC and has the DNA sequence 

Seq. Id. = 5 Position = 1 to 186 

GCTGAAGTAGGTCCCAAGGGTATGGCTGTTCGCCATTTAAAGTGGTACGCGAGCTGG 
30 GTTTAGAACGTCGTGAGACAGTTCGGTCCCTATCTGCCGTGGGCGCTGGAGAACTGA 
GGGGGGCTGCTCCTAGTACGAGAGGACCGGAGTGGACGCATCACTGGTGTTCGGGTT 
GTCATGCCAATGGCA 



A C1/C2 short loop on chromosome 1 whose identifier is 
3225 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
5 expressed as a RNA single strand that is 3'UTR to the 
gene rrlC and has the DNA sequence 

Seq, Id. 6 Position - 1 to 144 

10 AAACAGAATTTGCCTGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAAC 
TCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTA 
GGGAACTGCCAGGCATCAAATTAAGCAGTA 

A C1/C2 short loop on chromosome 1 whose identifier is 
15 3228 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene rrfC and has the DNA sequence 

20 Sa g. I d. = 1 Pos it ion === 1 to 11 2 

GGTCATAAAACCGGTGGTTGTAAAAGAATTCGGTGGAGCGGTAGTTCAGTCGGTTAG 
AATACCTGCCTGTCACGCAGGGGGTCGCGGGTTCGAGTCCCGTCCGTTCCGCCAC 

25 A C1/C2 short loop on chromosome 1 whose identifier is 
3301 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene ubiB and has the DNA sequence 

30 

Seq. Id. ^ 8 Position = 1 to 57 



TTATCGTGCCTACAAATAGTCCGAACCGTAGGCCGGATAAGGCGTTTACGCCGCATC 

A C1/C2 short loop on chromosome 1 whose identifier is 
3307 controls the expression of the genes of one or more 
5 other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3*UTR to the 
gene fadA and has the DNA sequence 

Seq. Id. = 9 Position =^ 1 to 56 

TGCCGGATGCGGCGTAAACGCCTTATCCGGCCTACGGTTCGGACTATTTGTAGGCA 

A C1/C2 short loop on chromosome 1 whose identifier is 
3327 controls the expression of the genes of one or more 
15 other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene hemG and has the DNA sequence 

Seq. Id. ^ 10 Position = 1 to 347 

20 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 
GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCG 
GGAAGGCGTATTATG. . . CCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGT 
25 AGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCG 
TAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGC 
GTTCTTTG 

The expression of genes in this T1/T2 long loop is 
30 controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 
3307 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene hemG and has the 
5 DNA sequence 

Seq. Id. -11 Position 1 to 347 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
10 CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 
GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCG 
GGAAGGCGTATTATG. . . CCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGT 
AGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCG 
TAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGC 
15 GTTCTTTG 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 S eq . Id. 1 1 P osi tio n = 1 to 1 7 5 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 
GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCG 
25 GGAA 

The match between the T2 sequence and the C1/C2 sequence 
is 

30 Seq. Id. = 11 Position = 28 to 202 



I 



TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGG 
AACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGA 
AAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATGCACACCCCGCGC 
CGCT 

5 

A C1/C2 short loop on chromosome 1 whose identifier is 
3432 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene btuB and has the 
10 DNA sequence 

Seq. Id. 12 Position - 1 to 335 

TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTAT 
15 AATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTC 
TCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGG 
CGTATTATGCACACC . . . ACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTA 
ACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAA 
GGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGT 

20 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. = 12 Position = 1 to 169 

25 

TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTAT 
AATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTC 
TCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAA 

30 The match between the T2 sequence and the C1/C2 sequence 
is 



Seq. Id. = 12 Position = 22 to 196 



TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGG 
AACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGA 
5 AAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATGCACACCCCGCGC 
CGCT 

A C1/C2 short loop on chromosome 1 whose identifier is 
2218 controls the expression of the genes in this T1/T2 
10 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 * UTR to the gene clpB and has the 
DNA sequence 

Seq. Id. = 13 Position = 1 to 72 

15 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCA 
AACACGCCGCCGGGC 

The match between the Tl sequence and the C1/C2 sequence 
20 is 

Seq. Id. = 13 Pos ition ==== 1 to 72 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCA 
25 AACACGCCGCCGGGC 

The match between the T2 sequence and the C1/C2 sequence 
is 

30 CTTGrCAGGCCGGAArAACTCCCTAIA 

AACGGCAAACACGCCGCCGGGTC Seq. Id. - 13 Position = 1 to 
21 



CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCA 
AACACGCCGCCGGG 



Example of an archea connectron - H. pylori 

In this example the existence of the T1-T2 (812-882) 
long loop is controlled by three C1/C2 short loops (881, 
813 and 1214) . The T1-T2 long loop controls the 

expression of 54 genes on chromosome 1 in addition to one 
C1/C2 (843) short loop, 

881 Chromosome 1 
813 Chromosome 1 
12 41 Chromosome 1 
I 

★ * ^ 

I Chromosome 1 | 

812 882 
I 842 I 



Connectron control elements for chromosome 1 of H. pylori 
genome 

A double stranded DNA loop of length 96.385 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 812. This Tl control element has the 
DNA sequence 

. Seq. Id. 14 Position 1 to 43 



TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 882. This T2 
control element has the DNA sequence 

5 

Seq. Id . = 15 Position 1 to 4 3 

TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 



10 


This long 


T1/T2 double 


stranded 


DNA loop modulates the 




expression 


of the following genes 








HP0999 


HPIOOO 


HPlOOl 


HP1002 


HP1003 




HP1005 


HP1006 


HP1008 


HP1009 


HPtRNA-Pro 


15 


HPlOlO 


HPlOll 


HP1013 


HP1015 


HP1017 




HP1018 


HP1020 


HP1021 


HP1022 


HP1023 




HP1024 


HP1025 


HP1027 


HP1028 


HP1030 




HP1031 


HP1033 


HP1034 


HP1038 


HP1039 




HP1040 


HP1041 


HP1042 


HP1043 


HP1044 


20 


HP1045 


HP1046 


HP1051 


HP1052 


HP1055 




HP1056 


HP1058 


HP1060 


HP1065 


HPtRNA-Ser 




HP1066 


HP1067 


HP1069 


HP1070 


HP1074 




HP1075 


HP1076 


HP1077 


HP1078 


HP1079 




HP1080 


HP1081 


HP1083 


HP1084 


HP1085 


25 


HP1088 


HP1091 


HP1092 


HP1093 


HP1094 




HP1095 


HP1096 










This long 


T1/T2 double 


stranded 


DNA loop modulates the 




expression 


of the followi 


ng C1/C2 


short loops 





30 



A C1/C2 short loop on chromosome 1 whose identifier is 
813 controls the expression of the genes of one or more 



I 



other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene HP0998 and has the DNA sequence 

5 Seq . Id. ^ 16 Po si tio n ^ 1 to 70 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCATTCATCCCAAACAC 
TAAAGATATTTGG 

10 The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
881 controls the expression of the genes of one or more 
15 other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene HP1096 and has the DNA sequence 

Seq. Id. - 17 Position - 1 to 70 

20 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCATTCATCCCAAACAC 
TAAAGATATTTGG 

The match between the Tl sequence and the C1/C2 sequence 
25 is 

Seq. Id. = 17 Position = 1 to 36 
TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

30 

The match between the T2 sequence and the C1/C2 sequence 
is 



4 



Seq. Id. ^ 17 Position = 28 to 70 



TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 

5 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
10 813 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene HP0998 and has 
the DNA sequence 

15 Seq. Id. = 18 Position ^ 1 to 70 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCATTCATCCCAAACAC 
TAAAGATATTTGG 

20 A C1/C2 short loop on chromosome 1 whose identifier is 
881 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene HP1096 and has 
the DNA sequence 

25 

S ea. Id. = 19 Posi tion = 1 to 70 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCATTCATCCCAAACAC 
TAAAGATATTTGG 

30 

The match between the Tl sequence and the C1/C2 sequence 
is 



4- 



I 



Seq. Id. = 19 Position = 1 to 43 



TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

5 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id, = 19 Position 28 to 70 

10 

TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 

A C1/C2 short loop on chromosome 1 whose identifier is 
1241 controls the expression of the genes in this T1/T2 
15 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene HP1535 and has 
the DNA sequence 

Seq, Id, 20 Position === 1 to 56 

20 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCATTCATCCCAAACA 

The match between the Tl sequence and the C1/C2 sequence 
is 

25 

Seq. Id. 20 Po sition 1 to 43 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

30 The match between the T2 sequence and the C1/C2 sequence 
is 



Seq, Id. = 20 Position = 28 to 56 

TAGCGGAACTAAAGCATTCATCCCAAACA 
5 

Example of single-celled connectron - S. cervesiae 

In this example the existence of the T1-T2 (1352-1416) 
10 long loop on chromosome 4 is controlled by one C1/C2 
short loop (4213) on chromosome 10. The T1-T2 long loop 
controls the expression of 34 genes on chromosome 4 in 
addition to one C1/C2 (1356) short loop. 



15 4213 Chromosome 10 

I 

* ★ ★ 

I Chromosome 4 | 

1352 1416 
20 I 1356 I 



Connectron control elements for chromosome 1 of S. 
25 cervesiae genome 

A double stranded DNA loop of length 68.908 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 
whose identifier is 1352. This Tl control element has 
30 the DNA sequence 

Seq. Id. ^ 21 Position 1 to 37 

TTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAA 

35 



+ 



I 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1416. This T2 
control element has the DNA sequence 

Seq . Id. 22 P os ition 1 t o 36 2 

ATTAGATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCAACTATCATCTACT 
AACTAGTATTTACGTTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAA 
TGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTA 
ATAGGATCAATGAATATTAACATATAAAACGATGATAATAATATTTATAGAATTGTG 
TAGAATTGCAGATTCCCTTTTATGGATTCCTAAATCCTTGAGGAGAACTTCTAGTAT 
ATCTACATACCTAATATTATAGCCTTAATCACAATGGAATCCCAACAATTACATCAA 
AATCCACATTCTCTACAGTA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



YDR170W-A 


YDR171W 


YDR172W 


YDR173C 


YDR174W 


YDR175C 


YDR17 6W 


YDR177W 


YDR178W 


YDR17 9C 


YDR17 9W-A 


YDR180W 


YDR181C 


YDR182W 


YDR18 3W 


YDR184C 


YDR185C 


YDR18 6C 


YDR187C 


YDR188W 


YDR18 9W 


YDR190C 


YDR191W 


YDR192C 


YDR193W 


YDR194C 


YDR195W 


YDR196C 


YDR197W 


YDR198C 


YDR199W 


YDR200C 


YDR201W 


YDR202C 


YDR2 03W 


YDR204W 


YDR205W 


YDR2 0 6W 


YDR207C 


YDR2 0 8W 


YDR2 09C 


YDR210W 








This long 


T1/T2 double 


stranded 


DNA loop modulates the 


expression 


of the following C1/C2 


short loops 





A C1/C2 short loop on chromosome 4 whose identifier is 
1356 controls the expression of the genes of one or more 



other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 ' UTR to the 
gene YDR170W-A and has the DNA sequence 

5 Seq. Id. ^ 2 3 Pos it ion 1 to 3 11 

AATCACACTAATCATTCTGATGATGAACTCCCTGGACACCTCCTTCTCGATTCAGGA 
GCATCACGAACCCTTATAAGATCTGCTCATCACATACACTCAGCATCATCTAATCCT 
GACATAAACGTAGTTGATGCTCAAAAAAGAAATATACCAATTAACGCTATTGGTGAC 
10 CTACAATTTCACTTCCAGGACAACACCAAAACATCAATAAAGGTATTGCACACTCCT 
AACATAGCCTATGACTTACTCAGTTTGAATGAATTGGCTGCAGTAGATATCACAGCA 
TGCTTTACCAAAAACGTCTTAGAACG 

The expression of genes in this T1/T2 long loop is 
15 controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 10 whose identifier is 
4213 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
20 single strand that is 3'UTR to the gene YJR029W and has 
the DNA sequence 

Seq. Id. = 24 Position = 1 to 346 

25 ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTATCAACTA 
ATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTAT 
GAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAACGCAAGGATTGATAATGTAATAG 
GATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGT 
AGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATA 

30 TTCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCA 
ACAT 



The match between the Tl sequence and the C1/C2 sequence 
is 

Sag. Id. ^ 24 Position 111 to 147 

5 

TTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAA 

The match between the T2 sequence and the C1/C2 sequence 
is 

10 

Seq. Id. ^ 24 Position ^ 1 to 38 
ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATC 
15 



Example of a multi-celled connectron - C. elegans 

20 

In this example the existence of the T1-T2 (9-138) long 
loop on chromosome 1 is controlled by three C1/C2 short 
loops on chromosome 5 (21719, 21949 and 21655) . The Tl- 
T2 long loop controls the expression of four genes on 
25 chromosome 1 in addition to seven C1/C2 (119, 122, 125, 
130, 132, 134 and 136) short loops. 



21719 Chromosome 5 
21949 Chromosome 5 
30 21655 Chromosome 5 

I 

* ★ ★ 

I Chromosome 1 | 

95 138 
35 I 119 122 I 



125 130 
132 134 
136 



I 

I 
I 



5 

A double stranded DNA loop of length 41.978 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 95. This Tl control element has the 
10 DNA sequence 

Seq. Id. === 25 Pos iti on 1 to 55 

CAGCACGTTCTTAACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 

15 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 138. This T2 
control element has the DNA sequence 

20 Sag. Id. = 26 Position = 1 to 36 

ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATCA 

This long T1/T2 double stranded DNA loop modulates the 
25 expression of the following genes 

Y73A3A.1 Y73A3A.1 ZC123.3 ZC123.2 

This long T1/T2 double stranded DNA loop modulates the 
30 expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 
119 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 



expressed as a RNA single strand that is 3*UTR to the 
gene ZC123 . 3 and has the DNA sequence 

Sag. Id. ^ 27 Position ^ 1 to 69 

5 

TTGAGAACTCTGCGTCTCAACTCCCGCATTTTTTGTAGATCTACGTAGATCAAACCG 
AAATGGGACACT 

A C1/C2 short loop on chromosome 1 whose identifier is 
10 122 controls the expression of the genes of one or more 
other T1/T2 long loops . This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene ZC123 . 3 and has the DNA sequence 

15 Sag. Id. = 28 Position 1 to 89 

GCACGGGGTTCTGGCCTTCCTCATTGAATTTTTCGCGCTCCATTGACAATCGCCTGC 
CGGACAACGCGTGGGAAAGTCGTGTACTCCAC 

20 A C1/C2 short loop on chromosome 1 whose identifier is 
125 controls the expression of the genes of one or more 
other T1/T2 long loops . This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene ZC123.3 and has the DNA sequence 

25 

Seq. Id. ^ 29 Position = 1 to 8 9 

ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTCGGCAAAC 
TCTTTCATTTCAATTTATGAGGGAAGCCAGAA 

30 

A C1/C2 short loop on chromosome 1 whose identifier is 
130 controls the expression of the genes of one or more 



other T1/T2 long loops. This C1/C2 short loop is 

expressed as a RNA single strand that is 3'UTR to the 
gene ZC123.2 and has the DNA sequence 

S_eq. Id. ^30 Position 1 to 1 21 

CTCCCGCATTTTTTGTAGATCTACGTAGATCAAACCGAAATGAGGCACTTTCTGAAT 
CCACGAGCTAGGCTTAAGCTTAGGCTTAAGCTTAGGCCTTTTCTCAGGCTTAGGCTT 
AGGCTTA 

A C1/C2 short loop on chromosome 1 whose identifier is 
132 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene ZC123.2 and has the DNA sequence 

Seq, Id> 31 Position = 1 to 190 

GCTTATGCTTGGGCTTAGGCTTAGGCGTAGGCTTAGGCTTAGGCTTAGGCTTATGCT 
TAGACTTAGTCTCACTATCAGTCTTAGGCTTAGGCTTAGACTTAGGCTTAAGCTTAG 
GCTTAAGCTTAGACTTAGGCTTAGGCTTAGGCTTAGGCTTAGGCTTAGGTTTGGGCT 
TAGGCTTAGGCTTAACCTC 

A C1/C2 short loop on chromosome 1 whose identifier is 
134 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene ZC123.2 and has the DNA sequence 



TCTGCGTCTTTTCTCCCGCATTTTTTGTAGATCTACGTAGATCAAACCGAAATGAGG 
CACTTTCTGAATCCACGAGCTAGGCTTAAGCTTAGGCTTAAGCTTAGGCCTTTTCTC 
AGGCTTAGGCTTAGGCTTA 

5 A C1/C2 short loop on chromosome 1 whose identifier is 
136 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 

expressed as a RNA single strand that is 3 ' UTR to the 
gene ZC123.2 and has the DNA sequence 

10 

Seq. Id. = 33 Position ^ 1 to 190 

GCTTATGCTTGGGCTTAGGCTTAGGCGTAGGCTTAGGCTTAGGCTTAGGCTTATGCT 
TAGACTTAGTCTCACTATCAGTCTTAGGCTTAGGCTTAGACTTAGGCTTAAGCTTAG 
15 GCTTAAGCTTAGACTTAGGCTTAGGCTTAGGCTTAGGCTTAGGCTTAGGTTTGGGCT 
TAGGCTTAGGCTTAACCTC 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

20 

A C1/C2 short loop on chromosome 5 whose identifier is 
21719 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene C39F7.5 and has 
25 the DNA sequence 



Seq. Id. ^ 34 Position - 1 to 65 

ACGTTCTTAACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGCATTTTT 
30 TGTAGATC 



The match between the Tl sequence and the C1/C2 sequence 
is 



Seq. Id, ^ 34 Position 1 to 51 

5 

ACGTTCTTAACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 

The match between the T2 sequence and the C1/C2 sequence 
is 

10 

Seq. Id, = 34 Position = 31 to 65 

ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATC 

15 A C1/C2 short loop on chromosome 5 whose identifier is 
21949 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene F16B4.4 and has 
the DNA sequence 

20 

Seq. ...Id., - 35 Po.s.ition - 1 to 9.5 

ACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATCT 
AC GT AG AT C AAGC C G AAAT GAG AC AC T CT G AC AC C AC G 

25 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. 35 Position ^ 1 to 42 

30 

ACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 



The match between the T2 sequence and the C1/C2 sequence 
is 



Seq. Id. 35 Position =22 to 63 

5 

ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATC 

A C1/C2 short loop on chromosome 5 whose identifier is 
21655 controls the expression of the genes in this T1/T2 
10 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene C39F7 . 3 and has 
the DNA sequence 

Seq. Id> = 36 Position = 1 to 61 

15 

AACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATC 
TACG 

The match between the Tl sequence and the C1/C2 sequence 
20 is 

Seq, Id. === 36 Position = 1 to 36 
AACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 

25 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 36 Position = 23 to 57 

30 

ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATC 



2. Many Connect:rons control "the expression of 
one se-t of genes ±n prokaryotes , archea, single- 
celled eukaryotes and mult:l-celled eukaryotes. 

5 Many different C1/C2 short loops can control the 
existence of one T1-T2 long loop. The C1/C2 short loops 
can be on the same chromosome or on different chromosomes 
from the T1-T2 long loop. This relationship is described 
as "many-to-one". This relationship exists in 

10 prokaryotes, archea, single-celled eukaryotes and multi- 
celled eukaryotes 

Example of a many-to-one connectron in prokaryotes - E. 
coli 

15 

In this example the existence of the T1-T2 (3197-3308) 
long loop is controlled by three C1/C2 short loops (3307, 
3432 and 2218) . 

20 3307 Chromosome 1 

3432 Chromosome 1 
2218 Chromosome 1 
I 

★ ★ 

25 I C h r omo s ome 1 

3197 



3308 



30 A double stranded DNA loop of length 93.542 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3197. This Tl control element has 
the DNA sequence 

35 Seq. Id. = 37 Position = 1 to 175 



+ 



I 



AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 
GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCG 
5 GGAA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3308. This T2 
control element has the DNA sequence 

10 

Seq. Id, = 38 Position ^ 1 to 175 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGG 
AACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGA 
15 AAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATGCACACCCCGCGC 
CGCT 



20 



This long 


T1/T2 double 


stranded DNA 


loop modulates the 


expression 


of the following genes 






rrsC 


gltU 


rrlC 


rrfC 


aspT 


trpT 


yifA 


yifE 


yifB 


ilvL 


ilvG_l 


ilvM 


ilvE 


ilvD 


ilvA 


ilvY 


ilvC 


ppiC 


b3776 


rep 


gppA 


rhlB 


trxA 


rhoL 


rho 


rfe 


wzzE 


wecB 


rf fH 


wecD 


wecE 


wzxE 


yifiyi_2 


wecG 


yifK 


argX 


hisR 


leuT 


proM 


aslB 


aslA 


hemY 


hemX 


hemD 


cyaA 


cyaY 


b3808 


dapF 


uvrD 


b3814 


corA 


yigF 


yigG 


rarD 


yigl 


pldA 


recQ 


yigj 


yigK 


pldB 



yigL yigM metR metE ysgA 

udp yigN ubiE yigP b3836 

yigU yigW_l rfaH yigC ubiB 

fadA fadB pepQ trkH hemG 

5 

The expression of genes in this T1/T2 long loop is 

controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
10 3307 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene hemG and has the 
DNA sequence 

15 Sag. Id. = 39 Position - 1 to 440 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 
GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAAT7VAATGCTTGACTCTGTAGCG 
20 GGAAGGCGTATTATG . . . GGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAG 
TAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCC 
GTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCG 
CTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGA 
ACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGTTCTTTG 

25 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. = 39 Position = 1 to 175 

30 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCG 



GGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCG 
GGAA 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

Seq. Id. ^ 39 Positi o n 2 8 to 192 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGG 
10 AACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGA 
AAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATGCACACCCCGCGC 
CGCT 

A C1/C2 short loop on chromosome 1 whose identifier is 
15 3432 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene btuB and has the 
DNA sequence 

20 Seq.^ Id . 40 Position _1. to 335 

TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTAT 
AATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTC 
TCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGG 
25 CGTATTATGCACACC . . . ACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTA 
ACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAA 
GGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGT 

The match between the Tl sequence and the C1/C2 sequence 
30 is 

Seq. Id. = 40 Position = 1 to 169 



TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTAT 
AATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTC 
TCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAA 

5 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id, ^ 40 Position ^ 22 to 196 

10 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGG 
AACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGA 
AAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATGCACACCCCGCGC 
CGCT 

15 

A C1/C2 short loop on chromosome 1 whose identifier is 
2218 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 * UTR to the gene clpB and has the 
20 DNA sequence 

Seq. Id. 41 Position 1 to 72 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCA 
25 AACACGCCGCCGGGC 

The match between the Tl sequence and the C1/C2 sequence 
is 



30 Seq. Id. = 41 Position = 1 to 72 



CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCA 
AACACGCCGCCGGGC 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

Sag. I d. ^ 41 Pos i tion 1 t o 72 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCA 
10 AACACGCCGCCGGGC 



Example of a many-to-one connectron in archea - M. 
15 jannaschii 

In this example the existence of the T1-T2 (1630-1643) 
long loop is controlled by four C1/C2 short loops (1629, 
1642, 124 and 1533) . 

20 

162 9 Chromosome 1 
1642 Chromosome 1 
124 Chromosome 1 
1533 Chr omo s ome 1 

25 I 

* * * 

I Chromosome 1 I 

1630 1643 



30 



A double stranded DNA loop of length 4.998 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 1630. This Tl control element has 
35 the DNA sequence 



Seq. Id. = 42 Position 1 to 175 

TTATTAATTAGTTCAAAGGATTTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTA 
TTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAATTAAGATTAATTAG 
5 GAAAGGAAATAAGATTTCTCTAACAGACAAGTTAAATTTTTGGATTTAAAAAGATAA 
AAAT 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1643. This T2 
10 control element has the DNA sequence 

Seq. Id, = 43 Position ^ 1 to 175 

TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTA 
1 5 TTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAAT AAAATTTCTCTAACAAA 
TAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAGAA 
AGAT 

This long T1/T2 double stranded DNA loop modulates the 
20 expression of the following genes 

MJ1597 MJ1598 MJ1599 MJ1600 MJ1601 

MJ1602 



25 The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
1629 controls the expression of the genes in this T1/T2 
30 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene MJ1597 and has 
the DNA sequence 



4 



Seq. Id, = 44 Position = 1 to 139 



ATATGTTTGAAATTTGAAAATAAGAGTATTTAGAAGTTATTAATTAGTTCAAAGGAT 
5 TTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTTTATT 
GAATTATTCAGATTTTTAAAAATTA 

The match between the Tl sequence and the C1/C2 sequence 
is 

10 

Seq. Id. = 44 Position ^ 37 to 139 

TTATTAATTAGTTCAAAGGATTTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTA 
TTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAATTA 

15 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. 44 Position = 81 to 139 

20 

GCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAAT 
TA 

A C1/C2 short loop on chromosome 1 whose identifier is 
25 1642 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene MJ1602 and has 
the DNA sequence 

30 Seq. Id. = 45 Position = 1 to 177 



ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
TATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACA 
AATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAG 
AAAGAT 

5 

The match betwe en the Tl sequence and the C1/C2 secjuence 
is 

Seq. Id, ^ 45 Position ^ 20 to 78 

10 

GCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAAT 
TA 

The match between the T2 sequence and the C1/C2 sequence 
15 is 

Seq. Id, = 45 Position - 3 to 177 

TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTA 
20 TTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACAAA 
TAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAGAA 
AGAT 

A C1/C2 short loop on chromosome 1 whose identifier is 
25 124 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene MJ0112 and has 
the DNA sequence 

30 Seq. Id. = 46 Position = 1 to 75 



4- 



ATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
TATTCAGATTTTTAAAAT 

The match between the Tl sequence and the C1/C2 sequence 
5 is 

Seq, Id . ^ 46 Position 1 to 75 

ATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
10 TATTCAGATTTTTAAAAT 

The match between the T2 sequence and the C1/C2 sequence 
is 

15 Seq. Id, = 46 Position ^ 20 to 75 

GCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAAT 

A C1/C2 short loop on chromosome 1 whose identifier is 
20 1533 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene MJ1486 and has 
the DNA sequence 

25 Seq, Id. = 47 Position ^ 1 to 58 

TTTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTTTAT 
T 

30 The match between the Tl sequence and the C1/C2 sequence 
is 



Seq. Id. = 47 Position =^ 1 to 58 

TTTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTTTAT 
T 

5 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq, Id. = 41 Position ^ 25 to 58 

10 

GCTGGTTTGATTATTTAGAATATTTGAGTTTATT 



15 Example of a many-to-one connectron in single-cell 
eukaryotes - S . cervesiae 

In this example the existence of the T1-T2 (5515-5533) 
long loop on chromosome 12 is controlled by seventeen 

20 C1/C2 short loops (5516, 5532, 1939, 2323, 1942, 3286, 

3649, 4764, 4751, 5536, 6102, 8023, 7356, 3293, 3291, 
3289 and 146) . 





5516 


Chromosome 


12 


25 


5532 


Chromosome 


12 




1939 


Chromosome 


4 




2323 


Chromosome 


5 




1942 


Chromosome 


5 




3286 


Chromosome 


7 


30 


3649 


Chromosome 


8 




4764 


Chromosome 


12 




4751 


Chromosome 


12 




5536 


Chromosome 


13 




6102 


Chromosome 


14 


35 


8023 


Chromosome 


16 




7356 


Chromosome 


16 




3293 


Chromosome 


8 



32 91 Chromosome 8 
32 8 9 Chromosome 8 
14 6 Chromosome 2 



I Chromosome 12 | 

3197 330J 



10 

A double stranded DNA loop of length 6.4 66 kilo-bases on 
chromosome 12 is bounded on the left by a Tl sequence 
whose identifier is 5515. This Tl control element has 
the DNA sequence 

15 

Sea. Id. = 48 Position ^ 1 to 225 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
20 ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 5533. This T2 
25 control element has the DNA sequence 

Seq. Id. ^ 4 9 Posi t ion = 1 to 2 25 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAA.ATTTTTTTTTCTAGGGAATATGC 
30 GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAJii^iAAGACAJ^TCTATAAJ^AAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

This long T1/T2 double stranded ONA loop modulates the 
35 expression of the following genes 



YLR4 67W 



This long T1/T2 double stranded DNA loop modulates the 
5 expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 12 whose identifier is 
5516 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 

10 expressed as a RNA single strand that is 3'UTR to the 
gene YLR4 64W and has the DNA sequence 

Seq. Id. = 50 Position = 1 to 252 

15 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

20 

A C1/C2 short loop on chromosome 12 whose identifier is 
5532 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
25 gene YLR4 67W and has the DNA sequence 

Seq. Id. 51 Position ^ 1 to 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
30 AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 



AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 
1939 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YDR545W and has 
the DNA sequence 

Seq> Id. 52 Position = 1 to 222 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGG 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. = 52 Position = 1 to 222 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGG 

The match between the T2 sequence and the C1/C2 sequence 



Seq. Id. = 52 Position = 28 to 222 



ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
5 GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGG 

A C1/C2 short loop on chromosome 5 whose identifier is 
2323 controls the expression of the genes in this T1/T2 
10 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YER189W and has 
the DNA sequence 

Seq. Id. ^ 53 Position = 1 to 252 

15 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
20 ATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

25 Seq. Id> =^ 53 Position ^ 1 to 225 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
30 AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 



The match between the T2 sequence and the C1/C2 sequence 
is 



ATrv \T G T ATTGTGTAGTATAGTATATTG T AAGA A ATTrrrTTTTCTACiGGA 
5 ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAyVA T AAAGGTAGTAAGTAGCTTTTGGTTGAA C A 
TCCGGGTA 

AGAGACAACAGGGCT Seq. Id. = 53 Position ^ 28 to 252 

10 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAG A GA C AAGTGGGA AA GA G TA GGATA AAA AGACAATC TAT AAAAAGTA AACAT A AA 
ATAAAGGTAG TAAGTAGCTTT T GGTTGAACAT CC GGGTAAGAGA CAA CAGGG CT 

15 

A C1/C2 short loop on chromosome 5 whose identifier is 
1942 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3*UTR to the gene YEL077C and has 
20 the DNA sequence 

Seq. Id. 54 Position 1 to 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
25 AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 



30 The match between the Tl sequence and the C1/C2 sequence 
is 



Seq. Id. = 54 Position 1 to 225 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
5 ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAA7VAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
is 

10 

ATTATGTATTGTGTAGTATAGTATATTGTAAG.\.\ATTTTTTTTTCTA 
Ar\rGCGTTTTGATGTAGTAGL 

ACGGCAGTAGCGAGAGACAAGTGGGA/V.\GAGTAGGATAAAAAGACAATC 
TATyXAAAAGTAAACATAAAATAAAGGTAGn^ 
15 TCCGGGTA 

AGAGACAACAGGGCT Se„q,, Id . ^ 54 Posit i on ^ 28 to 252 

ATTATGTATTGT GTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
20 GAGAGACAAGTGGGAA AGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 7 whose identifier is 
3286 controls the expression of the genes in this T1/T2 
25 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YGR296W and has 
the DNA sequence 

Seq. Id . - 55 Position =^ 1 to 252 

30 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 



ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

5 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. - 55 P ositi on =^ 1 to 22 5 

10 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

15 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq, Id. ^ 55 Position ^ 28 to 252 

20 ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

25 A C1/C2 short loop on chromosome 8 whose identifier is 
3649 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YHR219W and has 
the DNA sequence 

30 

Seq. Id, 56 Position = 1 to 252 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
5 ATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

1 0 Seq . Id. - 56 Position = 1 t o 225 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
15 AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
is 

20 Sag. Id . = 56 Po sition ^ 28 to 252 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
25 ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 12 whose identifier is 
4764 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
30 single strand that is 3'UTR to the gene YLL066C and has 
the DNA sequence 



4" 



I 



Seq. Id, =^ 57 Position = 1 to 252 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
5 ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence 
10 is 

Seq. Id. - 57 Position - 1 to 225 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
15 AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
20 is 

Seq, Id. ^ 57 P osition 28 to 252 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
25 GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 12 whose identifier is 
30 4751 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 



single strand that is 3*UTR to the gene YLL067C and has 
the DNA sequence 

Seq. Id. = 58 Position =^ 1 to 252 

5 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
1 0 ATCCGGGT AAGAGACAAC AGGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

15 Seq. Id. = 58 Position = 1 to 225 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
20 AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
is 

25 Seq. Id. = 58 Pos ition ^ 28 to 252 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
30 ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 



A C1/C2 short loop on chromosome 13 whose identifier is 
5536 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YML133C and has 
5 the DNA sequence 

Seq . Id. = 59 Position =^ 1 to 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
10 AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

15 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. 59 Position === 1 to 252 

20 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

25 The match between the T2 sequence and the C1/C2 sequence 
is 

ATTATGTATTGTGTAGTATAGTATATTGT.V\GA.\.\TTTTTT^ 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTT G C 
30 ACGGCAGTAGCGAGAGACAAGTGGG.\.\AGAGTAGGATAA.\.\AGACAATC 
TATAAAAAGTAAACATA/\AAT.\AAGGTAGTAAGTAGCTTTTGGTTGAACA 



I 



TCCGGGTAAGAGACAACAGGGC T Seg. Id. ^ 59 Position ^ 28 to 
252 



TATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGT 
ATTTCACTGTT TTGAT TTAGTGTTT GTTGCACGGCAGTAGCGAGAGAC AAGTGGGAA 
AGAGTAGGATAAAAAGACA ATC TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTA 
GCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 14 whose identifier is 
6102 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YNL339C and has 
the DNA sequence 

Seq. Id. === 6 0 Position - 1 to„ 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. 60 Position = 1 to 225 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 



The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 60 Position = 28 to 252 

5 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

10 

A C1/C2 short loop on chromosome 16 whose identifier is 
8023 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YPR2G4W and has 
15 the DNA sequence 

Seq. Id. = 61 Position = 1 to 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
20 AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

25 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id, ^ 61 Position - 1 to 252 

30 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 



I 



ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

Seq. Id. ^ 61 Position ^ 28 to 252 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
10 GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 16 whose identifier is 
15 7356 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YPL283C and has 
the DNA sequence 

20 Seq:^ Id. = 6 2 Pos itio n =^ 1 to 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
25 AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 

Seq. Id, = 62 Position = 1 to 225 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

5 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 62 Position ^ 28 to 252 

10 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

15 

A C1/C2 short loop on chromosome 8 whose identifier is 
3293 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YHL050C and has 
20 the DNA sequence 

Seq. Id. === 63 Position === 1 to 89 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
25 AGAAATTTTTTTTTCTAGGGAATATGCGTTTT 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 Seq. Id. 63 Position = 1 to 89 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTT 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq . Id . ^ 63 Position =^ 28 to 89 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTT 

A C1/C2 short loop on chromosome 8 whose identifier is 
3291 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene YHL050C and has 
the DNA sequence 

Seq. Id, ^ 64 Position 1 to 87 

ATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGA 
CAAGTGGGAAAGAGTAGGATAAAAAGACAA 

The match between the Tl sequence and the C1/C2 sequence 
is 

Se q. Id. === 64 Position = 1 to 87 

ATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGA 
CAAGTGGGAAAGAGTAGGATAAAAAGACAA 

The match between the T2 sequence and the C1/C2 sequence 



ATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGA 
CAAGTGGGAAAGAGTAGGATAAAAAGACAA 

A C1/C2 short loop on chromosome 2 whose identifier is 
145 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene YBL113C and has 
the DNA sequence 

Seq, Id. = 65 Position = 1 to 73 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGG 
TAAGAGACAACAGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq., Id, ~ 6.5.. Position ~ 1 t.o 4.7 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 65 Position =^ 1 to 73 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGG 
TAAGAGACAACAGGCT 



A C1/C2 short loop on chromosome 8 whose identifier is 
3289 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YHL050C and has 
5 the DNA sequence 

Sag . Id. ^ 66 P osition =^ 1 to 73 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGG 
1 0 T AAG AGAC AAC AGGCT 

The match between the Tl sequence and the C1/C2 sequence 
is 

15 Seq. Id, = 66 Position = 1 to 47 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence 
20 is 

Seq. Id. ===== 66 Position === 1 to 73 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGG 
25 TAAGAGACAACAGGCT 

A C1/C2 short loop on chromosome 2 whose identifier is 
146 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
30 single strand that is 3'UTR to the gene YBL113C and has 
the DNA sequence 



4- 



I 



Seq. Id. = 67 Position ^ 1 to 62 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAA 

5 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. - 67 P o sition 1 to 6 2 

10 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAA 

The match between the T2 sequence and the C1/C2 sequence 
15 is 

Seq. Id. = 67 Position = 28 to 62 
ATTATGTATTGTGTAGTATAGTATATTGTAAGAAA 

20 



Example of a many-to-one connectron in multi-cell 
eukaryotes - C. elegans 

25 

In this example the existence of the T1-T2 (3197-3308) 
long loop on chromosome 5 is controlled by three C1/C2 
short loops (4382, 4375 and 28633) . 

30 

4382 Chromosome 1 
4375 Chromosome 1 
2 8 633 Chromosome 5 
I 



★ ★ + 

I Chromosome 5 | 

28632 28697 

5 

A double stranded DNA loop of length 58.451 kilo-bases on 
chromosome 5 is bounded on the left by a Tl sequence 
whose identifier is 28632. This Tl control element has 
10 the DNA sequence 

Seq. Id. === 6 8 Position ^ 1 to 86 

GCAAAAATTGACTGAAAATTTGAATTTCCCGCAAAAAATTGACTGAAAATTTGAATT 
15 TCCCGCCAAAAATTGACTGAAAATTTGAA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 28697. This T2 
control element has the DNA sequence 

20 

Seq. Id. - 69 Position - 1 to 160 

CAAAAAATTGACTGAAAATTTGAATTTCCCTCCAAAAATTGACTGAAAATTTGAATT 
TCCCGCCAAAAATTGACTGAAAATTTGAATATCCCGCCAAAAATTGACTGAAAATTT 
25 GAATTTCCCGCCGAAAATTAAATGAAAAATGGAATTTCTCGCCGAA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

30 M162,8 M162.4 M162.3 M162,6 M162.2 

M162.1 M162.7 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 
4382 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
5 single strand that is 3 ' UTR to the gene Y43F8B.10 and has 
the DNA sequence 

Sag. Id> = 70 Position ^ 1 to 319 

10 ATTATAGAAAATTTAAATTTCCCTCCAAAAAATTGACTGAAAATTTGAATTTCCCTC 
CAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTGACTGAAAATTTGAATAT 
CCCGCCAAAAATTGACTGAAAATTTGAATTTCCCGCCGAAAATTAAATGAAAAATGG 
AATTTCTCGCCGAAAAATTCAGTAAAAATTTGAATTTCCTGCCAAAAATTGACTGAA 
AATTTGAATTTCTTGCCAAAAAAGTGACTGGGAATTTGAATTTCCCTCCAAAAATTG 

1 5 ACTGAAATTTTGAATTTCCCGCTAAAAGTTGACT 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 Seq . Id. = 70 Posi t i o n 58 t o 8 8 

CAAAAATTGACTGAAAATTTGAATTTCCCGC 

The match between the T2 sequence and the C1/C2 sequence 
25 is 

Seq. Id. = 70 Position === 26 to 185 

CAAAAAATTGACTGAAAATTTGAATTTCCCTCCAAAAATTGACTGAAAATTTGAATT 
30 TCCCGCCAAAAATTGACTGAAAATTTGAATATCCCGCCAAAAATTGACTGAAAATTT 
GAATTTCCCGCCGAAAATTAAATGAAAAATGGAATTTCTCGCCGAA 



A C1/C2 short loop on chromosome 1 whose identifier is 
4375 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 * UTR to the gene Y43F8B.10 and has 
5 the DNA sequence 

Seq. Id, 71 Position - 1 to 319 

ATTATAGAAAATTTAAATTTCCCTCCAAAAAATTGACTGAAAATTTGAATTTCCCTC 
1 0 C AAA AAT T G A C T G A AAAT T T G AAT T T C C C G C C AAAA AT T G AC T G AA AAT T T G AAT AT 
CCCGCCAAAAATTGACTGAAAATTTGAATTTCCCGCCGAAAATTAAATGAAAAATGG 
AATTTCTCGCCGAAAAATTCAGTAAAAATTTGAATTTCCTGCCAAAAATTGACTGAA 
AATTTGAATTTCTTGCCAAAAAAGTGACTGGGAATTTGAATTTCCCTCCAAAAATTG 
ACTGAAATTTTGAATTTCCCGCTAAAAGTTGACT 

15 

The match between the Tl sequence and the^ C1/C2 sequence 
is 

Seq. Id. - 71 Position = 58 to 88 

20 

CAAAAATTGACTGAAAATTTGAATTTCCCGC 

The match between the T2 sequence and the C1/C2 sequence 
is 

25 

Seq. Id. 71 Position 58 to 217 

C AAAAAAT T G AC T G AAAAT T T G AAT T T C C C T C C A AA AAT T G AC T G AAAAT T T G A AT T 
TCCCGCCAAAAATTGACTGAAAATTTGAATATCCCGCCAAAAATTGACTGAAAATTT 
30 GAATTTCCCGCCGAAAATTAAATGAAAAATGGAATTTCTCGCCGAA 



A C1/C2 short loop on chromosome 5 whose identifier is 
28633 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene M162.5 and has 
5 the DNA sequence 

Seq. Id. - 72 Position 1 to 85 

CAAAAATTGACTGAAAATTTGAATTTCCCGCAAAAAATTGACTGAAAATTTGAATTT 
1 0 CCCGCCAAAAATTGACTGAAAATTTGAA 

Seq, Id. - 72 Position - 1 to 85 

The match between the Tl sequence and the C1/C2 sequence 
15 is 

CAAAAATTGACTGAAAATTTGAATTTCCCGCAAAAAATTGACTGAAAATTTGAATTT 
CCCGCCAAAAATTGACTGAAAATTTGAA 

20 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 72 Position ^ 31 to 60 
25 CAAAAAATTGACTGAAAATTTGAATTTCCC 



3 . One connecbron coii'trols the expression o£ 
many se-ts of genes xn prokaryot:es , archea, 
single-celled eukaryot:es auid multi-celled 
eukaryotes . 

5 

One C1/C2 short loop can control the existence of a many 
T1-T2 long loops. The C1/C2 short loop can be on the 
same chromosome or on different chromosomes from the Tl- 
T2 long loops. This relationship is described as "one- 
10 to-many". This relationship exists in prokaryotes, 

archea, single-celled eukaryotes and multi-celled 
eukaryotes . 



Example of a one-to-many connectron in prokaryotes - E. 
15 coli 



20 



In this example the existence of T1-T2 (3208-3315, 3436- 
3476, 3439-3478 and 3441-3479) long loops are controlled 
by one C1/C2 short loop (3206) . 

32 0 6 Chromosome 1 
I 

* * -k 



I Chromosome 1 | 

25 3208 3315 



32 06 Chromosome 1 
I 

30 * * * 

I Chromosome 1 | 

3436 3476 



35 32 0 6 Chromosome 1 

I 



I Chromosome 1 | 

3439 3478 



5 32 0 6 Chromosome 1 

I 

* ★ ★ 

I Chromosome 1 | 

3441 3479 

10 



A double stranded DNA loop of length 93.377 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
15 whose identifier is 3208. This Tl control element has the 
DNA sequence 

Seq. Id. 73 Position ^ 1 to 340 

20 ACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCAAGCTGA 
AAATTGAAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTTCGCAACAC 
GATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACAC 
GGTGGATGCCCTGGC . . . AGTGTGTTTCGACACACTATCATTAACTGAATCCATAGG 
TTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCA 

25 ACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAATCAG 
T 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3315. This T2 
30 control element has the DNA sequence 

Seq . Id. 74 Position ^ 1 to 330 

TTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAACGAAAGTT 
35 GTTCGTGAGTCTCTCAAATTTTCGCAACTCTGAAGTGAAACATCTTCGGGTTGTGAG 
GTTAAGCGACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGATGAAGGACG 



TGCTAATCTGCGATA. . . GGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGT 
ACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAG 
CAGCCCAGAGCCTGAATCAGTGTGTGTGTTAGTGGAAGCGTCTGGAAA 

5 This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



rrlC 


rrfC 


aspT 


trpT 
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hemG 


rrsA 


ileT 



25 The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops, 

A C1/C2 short loop on chromosome 1 whose identifier is 
3206 controls the expression of the genes in this T1/T2 
30 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene rrsC and has the 
DNA sequence 



Seq. Id- = 75 Position = 1 to 367 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
5 AATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATAT 
CTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCA 
AGCTGAAAATTGAAA. . . ACCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGA 
CACACTATCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTGAAACA 
TCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAA 
10 CGGGGAGCAGCCCAGAGCCTGAATCAGT 

The match between the Tl sequence and the C1/C2 sequence 
is 

15 Seq. Id. = 75 Position = 121 to 367 

ACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCAAGCTGA 
AAATTGAAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTTCGCAACAC 
GATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACAC 
20 GGTGGATGCCCTGGC. , . AGTGTGTTTCGACACACTATCATTAACTGAATCCATAGG 
TTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCA 
ACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAATCAG 
T 

25 The match between the T2 sequence and the C1/C2 sequence 
is 

Sea. Id. = 75 Position - 148 to 232 

30 TTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAACGAAAGTT 
GTTCGTGAGTCTCTCAAATTTTCGCAAC 



A double stranded DNA loop of length 41,279 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
5 whose identifier is 3436. This Tl control element has 
the DNA sequence 



Seq. Id. ^ 76 Position 1 to 11 3 



10 ACGCAACGCGTGATAAGCAATTTTCGTGTCCCCTTCGTCTAGAGGCCCAGGACACCG 
CCCTTTCACGGCGGTAACAGGGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTT 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3476. This T2 
15 control element has the DNA sequence 

Seq. Id, = 77 Position 1 to 150 

AGTGAAAAGCAAGGCGTCTTGCGAAGCAGACTGATACGTCCCCTTCGTCTAGAGGCC 
20 CAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCGAATCCCCTAGGGGACGCCAC 
TTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATA 



25 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



gltT rrlB rrfB murB coaA 

b3975 tyrU thrT tufB secE 

nusG rplK rplA rplJ rplL 

rpoB rpoC htrC thiH thiF 

30 thiE yjaE yjaD hemE nfi 

yjaG hupA yjaH yjal hydH 

purD purH 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

5 A C1/C2 short loop on chromosome 1 whose identifier is 
3206 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene rrsC and has the 
DNA sequence 

10 

Seq. Id. 78 Position =^ 1 to 553 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
AATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATAT 

15 CTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCA 
AGCTGAAAATTGAAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTTCG 
CAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGC 
GTACACGGTGGATGCCCTGGCAGTCAGAGGCGATGAAGGACGTGCTAATCTGCGATA 
AGCGTCGGTAAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGGGGAAACCC 

20 AGTGTGTTTCGACACACTATCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGG 
GGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTA 
GCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAATCAGT 

The match between the Tl sequence and the C1/C2 sequence 
25 is 

Seq. Id. = 78 Position 1 to 86 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
30 AATCCCCTAGGGGACGCCACTTGCTGGTT 



+ 



I 



The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 78 Position === 1 to 113 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
AATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATA 



A double stranded DNA loop of length 41,336 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3439. This Tl control element has 
the DNA sequence 

Seq. Id, 79 Position = 1 to 94 

CCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAA 
TCTGGATCAAGCTGAAAATTGAAACACTGAACAACGA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3478. This T2 
control element has the DNA sequence 

Seq. Id. 80 Position ^ 1 to 94 

GTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACAC 
TGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTT 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



rrlB rrfB murB coaA b3975 

tyrU thrT tufB secE nusG 

rplK rplA rplJ rplL rpoB 

rpoC htrC thiH thiF thiE 

5 yjaE yjaD hemE nfi VjaG 

hupA yjaH yjal hydH purD 

purH gltV 



The expression of genes in this T1/T2 long loop is 
10 controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
3206 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
15 single strand that is 3'UTR to the generrsC and has the 
DNA sequence 

Seq. Id. 81 Position - 1 to 367 

20 GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
AATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATAT 
CTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCA 
AGCTGAAAATTGAAA. . . ACCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGA 
CACACTATCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTGAAACA 

25 TCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAA 
CGGGGAGCAGCCCAGAGCCTGAATCAGT 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 

Sea. Id. ^ 81 Position = 106 to 199 



I 



CCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAA 
TCTGGATCAAGCTGAAAATTGAAACACTGAACAACGA 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

.Seq. Id. ^ 81 Po sition = 133 to 226 

GTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACAC 
10 TGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTT 



A double stranded DNA loop of length 38.285 kilo-bases on 
15 chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3441. This Tl control element has 
the DNA sequence 

Seq. Id> = 82 Position = 1 to 355 

20 

AATTTTCGCAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGGTTAAGC 
GACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGATGAAGGACGTGCTAAT 
CTGCGATAAGCGTCGGTAAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGG 
GGAAACCCAGTGTGT . . . GATGAGAGAAGATTTTCAGCCTGATACAGATTAAATCAG 
25 AACGCAGAAGCGGTCTGATAAAACAGAATTTGCCTGGCGGCAGTAGCGCGGTGGTCC 
CACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGG 
GGTCTCCCCATGCGAG 

This double stranded DNA loop is bounded on the right by 
30 a T2 control element whose identifier is 3479. This T2 
control element has the DNA sequence 



I 



Sea. Id. = 83 Position = 1 to 356 

AAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCCTG 
GCAGTCAGAGGCGATGAAGGACGTGCTAATCTGCGATAAGCGTCGGTAAGGTGATAT 
5 GAACCGTTATAACCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGACACACTA 
TCATTAACTGAATCC . . . CAGATTAAATCAGAACGCAGAAGCGGTCTGATAAAACAG 
AATTTGCCTGGCGGCAGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAA 
GTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGGAAC 
TGCCAGGCATCAAATTA 

10 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



rrlB 


rrf B 


murB 
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b3975 


tyrU 


thrT 


tufB 


secE 


nusG 
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htrC 


thiH 


thiF 


thiE 


yjaE 
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yjaG 
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yjaH 


yjal 


hydH 


purD 


purH 


gltV 









The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

25 A C1/C2 short loop on chromosome 1 whose identifier is 
3206controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene rrsC and has the 
DNA sequence 

30 

G!XrCCriX^GrCL\GAGGCCCAGGACACCGCCCT1TCACGGCGGT^ ^ ^^^ 
GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 



4 



I 



ACCTGCCTrAATATCTCAAAACTCATCTTCGGGrGATG^m^ 

CTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAA...ACCGGCGATTTCCG 

AATGGGGAAACCCAGTGTGTTTCGACACACT'A 

GGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGA 
5 .\.\AGi\.\A TCAACCG AGATTC C C CCAGT AGCGGCGAGCGAACG GG G A GCA 
GCCCAGAGCCTGAATCAG T Seq. Id. ^ 84 Position = 1 to 519 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCG 
AATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATAT 
10 CTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCA 
AGCTGAAAATTGAAAAATTTTCGCAACACGATGATGAATCGAAAGAAACATCTTCGG 
GTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGAT 
GAAGGACGTGCT AAT CT GCGAT^^ 

GCGATTTC CGAATGGGGAAA C CC A GTGTGTTT CG ACACACTATC A TTAACTG AATCC 
1 5 ATAGGT TAATGAGGCGAACCGGGGGAA CTGAAACATCTA AGTACCCCGAGGAAAAGA 
AATCAACCGAGA TTCCCCCAGTAGCGG CGAG CGA A CGGGGAGCAGCCCAGAGCC TGA 
ATCAGT 

The match between the Tl sequence and the C1/C2 sequence 
20 is 

Seq. Id. ^ 84 Position = 187 to 519 

AATTTTCGCAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGGTTAAGC 
25 GACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGATGAAGGACGTGCTAAT 
CTGCGATAAGCGTCGGTAAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGG 
GGAAACCCAGTGTGTTTCGACACACTATCATTAACTGAATCCATAGGTTAATGAGGC 
GAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTC 
CCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAATCAGT 

30 

The match between the T2 sequence and the C1/C2 sequence 
is 



Seq. Id. = 84 Position = 214 to 519 

AAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCCTG 
5 GCAGTCAGAGGCGATGAAGGACGTGCTAATCTGCGATAAGCGTCGGTAAGGTGATAT 
GAACCGTTATAACCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGACACACTA 
TCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGT 
ACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAG 
CAGCCCAGAGCCTGAATCAGT 

10 



Example of a one-to-many connectron in archea - M. 
j annaschii 

15 

In this example the existence of T1-T2 (534-611, 1139- 
1159, and 1630-1643) long loops are controlled by one 
C1/C2 short loop (1642) . 



20 1642 Chromosome 1 

I 

* * * 

I Chromosome 1 | 

534 611 

25 

1542 Chromosome 1 
I 

* * ★ 

30 I Chromosome 1 | 

1139 1159 



1642 Chromosome 1 

35 I 

★ ★ ★ 

I Chromosome 1 I 

1630 1643 



A double stranded DNA loop of length 72.886 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
5 whose identifier is 534. This Tl control element has the 
DNA sequence 

Seq. Id. = 85 Position = 1 to 37 

10 TAAGTAAATAAAATTTCTCTAACAAATAAGTTAAATT 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 611. This T2 
control element has the DNA sequence 

15 

Seq . Id. ^ 8 6 Position - 1 to 59 

TAAATAAAATTTCTCTAACAAATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATG 
CT 

20 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 
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MJ0564 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
1642 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene MJ1602 and has 
the DNA sequence 

Sag. Id. = 87 Position =^ 1 to 177 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 

TATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACA 

AATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAG 
AAAGAT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Se q, Id. = 87 Position 92 to 127 
AAGTAAATAAAATTTCTCTAACAAATAAGTTAAATT 

The match between the T2 sequence and the C1/C2 sequence 



TAAATAAAATTTCTCTAACAAATAAGTTAAATTTTTGGATTTAAAAAGATAAAAAT 



5 

A double stranded DNA loop of length 14.509 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 1139. This Tl control element has 
the DNA sequence 

10 

Seq. Id. 88 Position ^ 1 to 78 

ATTTATTAATTAGTTCAAAGGATTTTTATTTAATTTCTAAGGGTTAGCTGGTTTGAT 
TGTTTAAAATATTTGAGTTTA 

15 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1159. This T2 
control element has the DNA sequence 

20 Se q. I d . 89 Posi t ion ^ 1 to 78 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
TATTCAGATTTTTAAAAATTA 

25 This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

MJ1096 MJ1097 tRNA-Arg-3 MJ1098 MJ1099 

MJllOO MJllOl MJ1102 MJ1103 MJ1104 

30 MJ1105 MJ1106 MJ1107 MJ1108 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
5 1642 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene MJ1602 and has 
the DNA sequence 

1 0 Seq. Id. ^ 90 Posit ion - 1 to 177 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
TATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACA 
AATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAG 
^ 15 AAAGAT 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 Seq,... Id., 90 Po s ition 1 to 31 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATT 

The match between the T2 sequence and the C1/C2 sequence 
25 is 

Seq. Id. =^ 90 Position = 1 to 78 



ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
30 TATTCAGATTTTTAAAAATTA 



A double stranded DNA loop of length 4.998 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 1630 . This Tl control element has 
5 the DNA sequence 

Se q . Id. ^ 91 Positi on ^ 1 t o 175 

TTATTAATTAGTTCAAAGGATTTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTA 
10 TTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAATTAAGATTAATTAG 
GAAAGGAAATAAGATTTCTCTAACAGACAAGTTAAATTTTTGGATTTAAAAAGATAA 
AAAT 

This double stranded DNA loop is bounded on the right by 
15 a T2 control element whose identifier is 164 3 . This T2 
control element has the DNA sequence 

Seq. Id. 92 Position = 1 to 175 

20 TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTA 
TTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACAAA 
TAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAGAA 
AGAT 

25 This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

MJ1597 MJ1598 MJ1599 MJ1600 MJ1601 

MJ1602 

30 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 
1642 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
5 single strand that is 3 ' UTR to the gene MJ1602 and has 
the DNA sequence 

Seq. Id. - 93 Position 1 to 177 

10 ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
TATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACA 
AATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAG 
AAAGAT 

15 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. - 93 Position ^ 20 to 78 

20 GCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAAT 
TA 

The match between the T2 sequence and the C1/C2 sequence 
is 

25 

Seq. Id. 93 Position 3 to 177 

TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAATTA 
TTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAATTTCTCTAACAAA 
30 TAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCTGTTTTATTATGGAAAGAA 
AGAT 



Example of a one-to-many connectron in single-cell 
eukaryotes - S. cervesiae 

In this example the existence of T1-T2 (158-171, 293- 
317, 4295-4308 and 5916-5923) long loops are controlled 
by one C1/C2 short loop (86) . 



10 8 6 Chromosome 1 

I 

* * * 

I Chromosome 1 | 

158 171 

15 



8 6 Chromosome 1 

I 

★ * ★ 

20 I C h r omo s ome 1 I 

293 317 



25 

8 6 Chromosome 1 
I 

★ ★ -At 

30 I C h r omo s ome 1 0 | 

4295 4308 



8 6 Chromosome 1 
35 I 

★ ★ * 

I Chromosome 13 | 

5916 5923 



40 



A double stranded DNA loop of length 2 0.391 kilo-bases on 
chromosome 2 is bounded on the left by a Tl sequence 



4- 



whose identifier is 158. This Tl control element has the 
DNA sequence 

Sag. Id, ^ 94 Position === 1 to 153 

5 

CCAATTGTTGGAATAAAAATCAACTATCATCTACTAACTAGTATTTACGTTACTAGT 
ATATTATCATATACGGTGTTAGAAGATGACGCAAATGATGAGAAATAGTCATCTAAA 
TTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAG 

10 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 171. This T2 
control element has the DNA sequence 

Seq. Id. = 95 Position = 1 to 192 

15 

ATAATTGTTGGAATAAAAATCAACTATCATCTACTAACTAGTATTTACGTTACTAGT 
ATATTATCATATACGGTGTTAGAAGATGACACAAATGATGAGAAATAGTCATCTAAA 
TTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGGATCAATGAATATTAACA 
TATAAAATGATGATAATAATA 

20 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

YBL107W-A TL(UAA)B1 YBL107C YBL106C YBL105C 

25 YBL104C YBL103C YBL102W YBLIOIC 

The expression of genes in this T1/T2 long loop is 
controlled by the foil owing C1/C2 short loops. 

30 A C1/C2 short loop on chromosome 1 whose identifier is 86 
controls the expression of the genes in this T1/T2 long 
loop. This C1/C2 short loop is expressed as a RNA single 



I 



strand that is 3 ' UTR to the gene YAR009C and has the DNA 
sequence 

Seq. Id. = 96 Position 1 to 362 

5 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTACTAACTA 
GTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATGATG 
AGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGG 
ATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTA 
10 GAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATAT 
TCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAA 
CATTCACCCATTTCTCAGAA 

The match between the Tl sequence and the C1/C2 sequence 
15 is 

Seq> Id. = 96 Position = 34 to 65 
AAATCAACTATCATCTACTAACTAGTATTTAC 

20 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 96 Position 34 to 65 

25 

AAATCAACTATCATCTACTAACTAGTATTTAC 



30 A double stranded DNA loop of length 38.470 kilo-bases on 
chromosome 2 is bounded on the left by a Tl sequence 



whose identifier is 293. This Tl control element has the 
DNA sequence 



Seq. Id. 



97 Position = 1 to 258 



GAATTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAATA 
TATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTGTCATCGAAG 
TTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAGGATAATGAAACATATAAA 
ACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 
1 0 ATTCCT AT ATCCTTGAGGAGAACTTCTAGT 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 317. This T2 
control element has the DNA sequence 



15 



20 



Seq, Id. = 98 Position = 1 to 77 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
AACTTCTAGTATATTCTGTA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



25 



YBL005W-B 
YBLOOIC 
YBR005W 
YBROlOW 



TS (AGA) B 
YBROOIC 
YBR00 6W 
YBROllC 



YBL004W 
YBR002C 
YBR007C 
YBR012C 



YBL003C 
YBR003W 
YBR008C 



YBL002W 
YBR004C 
YBR009C 



The expression of genes in this T1/T2 long loop is 
30 controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 86 
controls the expression of the genes in this T1/T2 long 
loop. This C1/C2 short loop is expressed as a RNA single 
strand that is 3'UTR to the gene YAR009C and has the DNA 
5 sequence 

S ag. Id. ^ 99 Position 1 to 362 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTACTAACTA 
10 GTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATGATG 
AGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGG 
ATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTA 
GAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATAT 
TCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAA 
15 CATTCACCCATTTCTCAGAA 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 Se q . Id. === _.9^ _ Posit io n = 181 t o 2 64 

AAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATT 
CCATTTTGAGGATTCCTATATCCT 

25 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq, Id. =^ 99 Position = 215 to 291 

30 AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
AACTTCTAGTATATTCTGTA 



+ 



A double stranded DNA loop of length 11.020 kilo-bases on 
chromosome 10 is bounded on the left by a Tl sequence 
whose identifier is 4295. This Tl control element has 
the DNA sequence 

Seq. Id. ^ 100 Position = 1 to 145 

AAACGCAAGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGAAT 
GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTA 
TATCCTCGAGGAGAACTTCTAGTATATTCTG 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 4308. This T2 
control element has the DNA sequence 

Seq. Id. 101 Position - 1 to 18Q 

GGAAGCTGAAACGCAAGGATTGATAATGTAATAGGATCAATGAATATAAACATATAA 

AACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAG 

GATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATATTATAGC 
CTTTATCAA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

YJR027W YJR029W 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 87 
controls the expression of the genes in this T1/T2 long 
loop. This C1/C2 short loop is expressed as a RNA single 
strand that is 3 ' UTR to the gene YAR009C and has the DNA 
5 sequence 

Seq. Id, ^ 102 Position =^ 1 to 359 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTACTAACTA 
10 GTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATGATG 
AGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGG 
ATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTA 
GAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATAT 
TCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAA 
15 CATTCACCCATTTCTCA 



A double stranded DNA loop of length 5.4 62 kilo-bases on 
20 chromosome 13 is bounded on the left by a Tl sequence 
whose identifier is 5916. This Tl control element has 
the DNA sequence 

Seq. Id. = 103 Pos it ion = 1 to 146 

25 

AAGCTGAAGTGCAAGGATTGATAATGTAATAGGATAATGAAACATATAAAACGGAAT 
GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTA 
TATCCTCGAGGAGAACTTCTAGTATATTCTGTA 

30 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 5923, This T2 
control element has the DNA sequence 



Seq. Id. = 104 Position = 1 to 146 



IMTAATAGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTA 
5 TGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGT 
ATATTCTGTATACCTAATATTATAGCCTTTATCAA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

10 

YML045W 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

15 

A C1/C2 short loop on chromosome 1 whose identifier is 87 
controls the expression of the genes in this T1/T2 long 
loop. This C1/C2 short loop is expressed as a RNA single 
strand that is 3'UTR to the gene YAR009C and has the DNA 
20 sequence 

Seq. Id. ^ 105 Position ==== 1 to 359 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTACTAACTA 
25 GTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATGATG 
AGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGG 
ATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTA 
GAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATAT 
TCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAA 
30 CATTCACCCATTTCTCA 



4- 



Example of a one-to-many connectron in multi-cell 
eukaryotes - C. elegans 

5 In this example the existence of T1-T2 (16554-16661 and 
21565-21590) long loops are controlled by one C1/C2 short 
loop (21591) . 



10 



21591 Chromosome 5 
I 

* ★ ★ 

15 I Chromosome 4 | 

16554 16661 



21591 Chromosome 5 
20 I 

^ ★ ★ 

I C h r omo s ome 5 | 

21565 21590 



25 



A double stranded DNA loop of length 50.159 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 
30 whose identifier is 16554. This Tl control element has 
the DNA sequence 

Seq. Id. ^ 106 Position === 1 to 143 

35 TGCCTGAAAAAATTGGCTCCGAGTTAGGACACTTGGGGTGGTCAAAAAATTTTGTGA 
CTATTGTCAAATGAAAGATCATAGTTGATAACATAAATTCCCAAAGTTTCATAAAAA 
TCGATACGCAGCGAACAAAGTTATCAATT 



4" 



I 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 16661. This T2 
control element has the DNA sequence 

.Seg. I d. ^ 107 Position ^ 1 to 141 

CACTTGGGGTGGTCAAAAAATTTTGTGATTATTGTCAAATGAAAGATCATGGTTGAT 
AACATAAATTCCCAAAGTTTCATAAAAATCGATACGCAGCGAACAAAGTTATGATTT 
TTGACCCGGAACTTATTTGGAGACCTA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

C23H5,7 C23H5.8a C23H5.3 C23H5.2 C23H5.9 

C23H5. 1 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 
21591 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene F25A2.1 and has 
the DNA sequence 

Seq. Id. 1Q8 Position = 1 to 117 

TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCATAAAAAT 

CGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACTTATTTGGAGACCTAAT 
ATT 



The match between the Tl sequence and the C1/C2 sequence 
is 

Sea. Id. - 108 Position ^ 46 to 85 

5 

TTTCATAAAAATCGATACGCAGCGAACAAAGTTAT 

The match between the T2 sequence and the C1/C2 sequence 
10 is 

Seq, Id- ^ 108 Position 1 to 42 
TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCA 

15 



A double stranded DNA loop of length 18.142 kilo-bases on 
chromosome 5 is bounded on the left by a Tl sequence 
20 whose identifier is 21565. This Tl control element has 
the DNA sequence 

Seq. >Id. = 109 Position =^ 1 to 72 

25 CTCCGAGTTAGGACACTTGGGGTGGACAAAAAATTTTGTGACTATTGTCAAATGAAA 
GATCATGGTTGATAA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 21590. This T2 
30 control element has the DNA sequence 

Seq. Id. 110 Position ===== 1 to 115 



I 



TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCATAAAAAT 

CGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACTTATTTGGAGACCTAAT 
A 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

T21H3.2 T21H3,1 F25A2.1 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 
21591 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene F25A2.1 and has 
the DNA sequence 

Sea., IcL„„ = 1 1 1 Posit ion 1 t o 117 

TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCATAAAAAT 
CGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACTTATTTGGAGACCTAAT 
ATT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Sag. Id, = 111 Position = 1 to 30 



T ATT GT C AAAT G AAAG AT CAT GGT T G AT AA 



The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 111 Position 1 to 115 

TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCATAAAAAT 
CGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACTTATTTGGAGACCTAAT 



4 . Connectrons occur between prokaryotes and 
-bhexr plasxaxds . 

Connectron relationships exist between prokaryotes and 
5 their plasraids . These connectrons implement a control 
mechanism between the two genomes that makes it possible 
for them to form a symbiotic relationship. In the case 
of D. radiodurans the relationship is not symmetric. The 
D. radiodurans genome sends C1/C2 short loops to the MPl 
10 plasmid. 

Example of a prokaryote/plasmid connectron - D. 
radiodurans 

15 In this example the existence of T1-T2 (2654-2694 and 
2692-2749) long loops in chromosome 3 that is the plasmid 
MPl are controlled by one C1/C2 short loop (16) in 
chromosome 1 . 



20 16 Chromosome 1 

27 68 Chromosome 3 (plasmid MPl) 

2 653 Chromosome 3 (plasmid MPl) 
I 

* ★ -k 

25 I Chromosome 3 (plasmid MPl) | 

2654 2694 
I 2693 I 

30 16 Chromosome 1 

2768 Chromosome 3 (plasmid MPl) 

2 693 Chromosome 3 (plasmid MPl) 
I 

* ★ ★ 

35 I Chromosome 3 (plasmid MPl) | 

2692 2749 
I 2693 2695 | 



I 



A double stranded DNA loop of length 4 6.903 kilo-bases on 
chromosome 3 (plasmid MPl) is bounded on the left by a Tl 
sequence whose identifier is 2654. This Tl control 
element has the DNA sequence 

Seq. Id. ^ 112 Position 1 to 274 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGC 
AGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCAT 
TCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGAT 
CAGCCCACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 
CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACC 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 2 694. This T2 
control element has the DNA sequence 

Seq. Id. ^ 113 Position = 1 to 274 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGATTCGTCG 
TTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGAC 
TGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGA 
CGCCTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTGGAA 
TGGCTGTGCCGCGCGGACCGAACGCGGAATCGAGCAATCCTGTTGT 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

DRB0020 DRB0021 DRB0022 • DRB0023 DRB0024 

DRB0025 DRB0027 DRB0030 DRB0032 DRB0033 

DRB0034 DRB0035 DRB0037 DRB0038 DRB0039 



DRB0041 DRB0042 DRB0043 DRB0044 DRB0045 

DRB0047 DRB0051 DRB0052 DRB0054 DRB0055 
DRB0057 

5 This long T1/T2 double stranded DNA loop modulates the 

expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2693 controls the expression of the genes 
10 of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 * UTR to 
the gene DRB0057 and has the DNA sequence 

Sag. Id, = 114 Position = 1 to 103 

15 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 

The expression of genes in this T1/T2 long loop is 
20 controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 16 
controls the expression of the genes in this T1/T2 long 
loop. This C1/C2 short loop is expressed as a RNA single 
25 strand that is 3 ' UTR to the gene DR0009 and has the DNA 
sequence 

Sea. Id. = 115 Position 1 to 186 

30 GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTTCTCAGC 
GCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTT 



+ 



CTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCG 
GAGAGTACGATTCGT 



5 The match between the Tl sequence and the C1/C2 sequence 
is 

CAGCG'miTCTCGCTG T I'C C TGGACGGCTGAACGCCCrGAATCTCTCCC G 
GTATGCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTG 
10 ACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACA 
GCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGACGC 
CTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTG 
GAATGGCTGTGCCGC G C G GACC Seq. Id. = 115 Position = 105 to 
186 

15 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGC 
AGCCTGCTCGGAGAGTACGATTCGT 

The match between the T2 sequence and the C1/C2 sequence 
20 is 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGA 
TTCGTCGTTGGCTGCACCG.\.\GTGACGATGGGGCCATTCCGTGGGGCGCG 
TTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 
25 ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCA.\i\GCCGGTCAGCA 
CACCGCACACCTCGGCTCGnCTGG.\.\TGGCTGTGCCGCCCGGACCGAAC 
GCGGAATCGAGCAATCCTGTl-GT Sea . Id. = 115 Position 
132 to 186 

30 GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGATTCGT 



I 



A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2768 controls the expression of the genes 
in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
5 gene DRB0133 and has the DNA sequence 

Sea. Id. === 116 P o sition =^ 1 to 18 6 

GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTTCTCAGC 
10 GCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTT 
CTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCG 
GAGAGTACGATTCGT 



15 The match between the Tl sequence and the C1/C2 sequence 
is 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 
GrATGCAGCCTGCTCGGAGAGTACGAlTCGTCGTO 
20 ACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACA 
G C AATCG AG A GTGGG CrGAT r AG CCC A CT G TGCGnC^^ ^ ^^ 
CTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTG 

GAATGGCTGTGCCGCGCGGACC Seq . Id. ^ 11 6 Positi on ^ 105 to 

186 

25 

CA GCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGC 
AGCCTGCTCGGAGAGTACGATTCGT 

The match between the T2 sequence and the C1/C2 sequence 
30 is 



+ 



GCrGAACGCCCTGAATCTCTCCCGGTATGC 

TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 
ITACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCro 
ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCv\AAGCCGGTCAGCA 
5 C A C CG CAC A €€TCGGCTCGTT C T GG.\.\TGG CTGTG CCGCGCGG ACCG AA C 
GCGGAATCGAGCAATCCTGTrGT Sea, Id. = 116 Position 
132 to 186 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGATTCGT 

10 

A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2653 controls the expression of the genes 
in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
15 gene DRB0017 and has the DNA sequence 

Sea . Id. 117 Pos i tion ^ 1 to 1 86 

CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTC 
20 TCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGG 
AGAGTACGATTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCG 
CGTTACACCAGGCGA 

The match between the Tl sequence and the C1/C2 sequence 
25 is 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 
GTATGCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTG 
ACGATGGGGCCA1TCCGTGGGGCGCGITACACCAGGCGACTGTCAGTACA 
30 GC.\.\TCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGACGC 
CTCTTrrCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCG^r^^^ 



GAATGGCTGTGCCGCGCGGACC Se g. Id. = 117 Position = 47 to 
186 

CAGCG TTTTTCTCGCT GT TCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGC 
AGCCTGCTCGGAGAGTACr.ATTC GTCGTTGGCTGCA C CGAAGTGACGATGGGGCCAT 
TCCGTGGGGCGCGTTACACCAGGCGA 

The match between the T2 sequence and the C1/C2 sequence 
is 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGA 

TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 

TI'ACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 

ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 

CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACCGAAC 

GCGGAATCGAGCAATCCTGrrGT Sea. Id. = 117 Position 

74 to 186 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGATTCGTCG 
TTGGCTGCACCGAAGTGACGATGGG GCCATT CC G T G G G6CGCG TTACACCAGGCGA 



A double stranded DNA loop of length 68.612 kilo-bases on 
chromosome 3 (plasmid MPl) is bounded on the left by a Tl 
sequence whose identifier is 2692. This Tl control 
element has the DNA sequence 

Seq. Id. = 118 Posi tio n = 1 to 103 



CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 2749. This T2 
control element has the DNA sequence 

5 

Seq . Id. ^ 119 Position ^ 1 t o 10.1 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTT 
TTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGT 

10 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 





DRB0059 


DRB0060 


DRB0061 


DRB0062 


DRB0064 


15 


DRB0065 


DRB00 6 6 


DRB0067 


DRB0068 


DRB0069 




DRB0070 


DRB0072 


DRB0073 


DRB0074 


DRB0076 




DRB0077 


DRB0079 


DRB008 0 


DRB0081 


DRB0083 




DRB0085 


DRB008 6 


DRB0087 


DRB0088 


DRB0089 




DRB0090 


DRB0092 


DRB0093 


DRB0094 


DRB00 96 


20 


DRB0 0 97 


DRB0098 


DRB0102 


DRB0103 


DRB0104 




DRB0105 


DRB010 6 




DRB0107 


DRBOlll 




DRB0112 












This long 


T1/T2 double 


stranded 


DNA loop modulates the 


25 


expression 


of the following C1/C2 


short loops 





A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2693 controls the expression of the genes 
of one or more other T1/T2 long loops. This C1/C2 short 
30 loop is expressed as a RNA single strand that is 3'UTR to 
the gene DRB0057 and has the DNA sequence 



4- 



Seq. Id. = 120 Position = 1 to 103 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 

5 

A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2695 controls the expression of the genes 
of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 ' UTR to 
10 the gene DRB0057 and has the DNA sequence 

Seq. Id. ^ 121 Position ^ 1 to 274 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGATTCGTCG 
15 TTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGAC 
TGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGA 
CGCCTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTGGAA 
TGGCTGTGCCGCGCGGACCGAACGCGGAATCGAGCAATCCTGTTGT 

20 The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 16 
controls the expression of the genes in this T1/T2 long 
25 loop. This C1/C2 short loop is expressed as a RNA single 
strand that is 3 ' UTR to the gene DR0009 and has the DNA 
sequence 

Seq. Id. = 122 Position = 1 to 186 

30 

GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTTCTCAGC 
GCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTT 



CTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCG 
GAGAGTACGATTCGT 

The match between the Tl sequence and the C1/C2 sequence 
5 is 

Seq. Id- - 122 Pos i tion = 28 t o 130 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
10 AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 

The match between the T2 sequence and the C1/C2 sequence 
is 

15 Seq, Id. = 122 Position - 55 to 157 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTT 
TTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGT 

20 A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2768 controls the expression of the genes 
in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 ' UTR to the 
gene DRB0133 and has the DNA sequence 

25 

GCTGTGA/NATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTT 
CTCAGCGCGGTCCCGCrGCGCAAGACGCAGCGGAAm 
GCTCAGCGTTTTT CT CG C TG TTCCTGG ACGGCTGAACGCCCTGAATCTCTC 
CCGGTArGCAGCCTGCTCGGAGAGTACGAnCGT...CGGACCG^^ 
30 ATCGAGCAATCCTGTTGTGCCCTCATTGATGTCCAGCACCGGCAGGCCTTG 
ACGGTCGATGTCCGTCAGACCCTGACCG G G TCTG AGGCTCCAACTCGTCT 
GGAACAG Seq. Id. - 12 3 Position ^ 1 to 309 



GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTTCTCAGC 
GCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTT 
CTCGCTGTTCC TGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTnG 
5 GAGMLmCGATXCGTCGGACCGAA CGCGGAA TCGA GCA^ 

GATGTC CAGCACCGGCAGGCC T T GACGGTCGATQT CCG T CAGACCCTGA C CGGGTCT 
GAGGCTCCAACTCGTCTGGAACAG 

The match between the Tl sequence and the C1/C2 sequence 
10 is 

Seq. Id. 123 Position 28 to 130 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
15 AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 

The match between the T2 sequence and the C1/C2 sequence 
is 

20 Seq. I d. - 123 Posi t i on === 55 to 10 7 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTT 
TTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGT 

25 A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2693 controls the expression of the genes 
in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 • UTR to the 
gene DRB0057 and has the DNA sequence 

30 

Seq. Id. = 124 Position 1 to 103 



I 



CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 

The match between the Tl sequence and the C1/C2 sequence 
5 is 

Seq. Id. = 124 Pos i tion ^ 1 to 103 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGG 
10 AATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGGAC 

The match between the T2 sequence and the C1/C2 sequence 
is 

15 Seq. Id . = 124 Position ^ 28 to 103 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTT 
TTTCTCGCTGTTCCTGGAC 

20 A C1/C2 short loop on chromosome 3 (plasmid MPl) whose 
identifier is 2 653 controls the expression of the genes 
in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene DRB0017 and has the DNA sequence 

25 

Seq. Id, ^ 125 Posi tion 1 to 186 

CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTC 
TCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGG 
30 AGAGTACGATTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCG 
CGTTACACCAGGCGA 



The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. = 125 Position 1 to 172 

5 

CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTC 
TCGCTGTTCCTGGAC 

The match between the T2 sequence and the C1/C2 sequence 
10 is 

Seq. Id. 125 Position = 1 to 99 

CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTC 
15 TCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGT 



5 . Connectrons occur ±n plant:s and higher 
animals 

Connectron relationships exist in plant and higher 
5 animals. 

Example of a plant connectron - A. thaliania 

In this example the existence of the T1-T2 (423-469) 
10 long loop is controlled by six C1/C2 short loops (972, 
21396, 422, 21762, 21813 and 10882) . The T1-T2 long loop 
controls the expression of six genes on chromosome 2 in 
addition to two C1/C2 (426 and 430) short loops. 



15 972 Chromosome 2 

2139 6 Chromosome 4 
422 Chromosome 2 
217 62 Chromosome 4 
21813 Chromosome 4 

20 10882 Chromosome 4 

I 

* ★ ★ 

I Chromosome 2 | 

423 469 
25 I 426 430 I 



A double stranded DNA loop of length 42.285 kilo-bases on 
30 chromosome 2 is bounded on the left by a Tl sequence 
whose identifier is 423. This Tl control element has the 
DNA sequence 

Seq. Id- = 126 Position 1 to 67 

35 



I 



TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAA 
AAACGAAATA 



This double stranded DNA loop is bounded on the right by 
5 a T2 control element whose identifier is 469. This T2 
control element has the DNA sequence 



Sea. Id. = 127 Position - 1 to 67 



10 TACTAATTTAATTAATTAAATTTAATTT^AAAACGAAATACATTATTAATTTTCAAA 
AATAATAACC 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

15 

At2g02070 At2g02080 At2g02090 At2g02100 At2g02120 
At2g02130 



This long T1/T2 double stranded DNA loop modulates the 
20 expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 2 whose identifier is 
426 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
25 expressed as a RNA single strand that is 3 * UTR to the 
gene At2g02 060 and has the DNA sequence 



Seq. Id. = 128 Position = 1 to 55 



30 TTCCAAAAATAATAACCAATCAAAATCAACATATAAGATTTGATATCTAAATTTT 



A C1/C2 short loop on chromosome 2 whose identifier is 
430 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene At2g02060 and has the DNA sequence 

S e q . Id. 12 9 Position 1 to 55 

TTGCGGAAAAATAATATCATCATTATAAAAAAATAATTAGAGTTTTTTCGCATAT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 2 whose identifier is 
972 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 * UTR to the gene At2g04240 and has 
the DNA sequence 

S e q. Id . 130 Positi on ^ 1. to 11 8 

GTATGCCATTAGAAATAAAATTTTAAAAGTAAATTAATTCATCTCTTTAAAAATTAA 
AAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATTATTA 
ATTT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Sea. Id. = 130 Position = 53 to 106 



ATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATA 



The match between the T2 sequence and the C1/C2 sequence 
is 



Seq. Id. - 130 Position 167 to 118 

5 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATTATTAATTT 

A C1/C2 short loop on chromosome 4 whose identifier is 
21396 controls the expression of the genes in this T1/T2 
10 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene AT4gl5300 and has 
the DNA sequence 

Seq. Id. = 131 Position - 1 to 122 

15 

TGCCATTAGAAATAAAATTTTAAAGAGTAAATTAATTTATCTCTTTAAGGATTAAAA 
AGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATTATTAAT 
TTCCAAAA 

20 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 131 Position ^ 38 to 104 

25 TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAA 
AAACGAAATA 

The match between the T2 sequence and the C1/C2 sequence 
is 

30 

Seq. Id. = 131 Position = 65 to 116 



TACTAATTTAATTAATTAAATTTAATTTUU^AACGAAATACATTATTAATTT 

A C1/C2 short loop on chromosome 2 whose identifier is 
422 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene At2g02060 and has 
the DNA sequence 

Sea, Id. - 132 Position ^ 1 to 137 

TAACCTTAATTTTTGTAAGTAATTATATAGGTATGCCATTAGAAATAAAATTTTAAA 
GAGTAAATTAATTTATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATT 
AAATTTAATTAAAAAACGAAATA 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 132 Position - 71 to 137 

TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAA 
AAACGAAATA 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 132 P ositio n ^ 98 to 137 
TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATA 

A C1/C2 short loop on chromosome 4 whose identifier is 
21762 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 



single strand that is 3 ' UTR to the gene AT4gl7510 and has 
the DNA sequence 

Seq. Id. - 133 Position = 1 to 65 

5 

TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGA 
AATACATT 

The match between the Tl sequence and the C1/C2 sequence 
10 is 

Seq. Id. ^ 133 Position 1 to 61 

TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGA 
15 AATA 

The match between the T2 sequence and the C1/C2 sequence 
is 

20 Seq.., id. == = 133 Posi t ion ^ 2 2 to 65 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATT 

A C1/C2 short loop on chromosome 4 whose identifier is 
25 21813 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 • UTR to the gene AT4gl7680 and has 
the DNA sequence 



30 Seq. Id. ^ 134 Position = 1 to 65 



TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGA 
AATACATT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id, ^ 1 34 Position = 1 to 61 

TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGA 
AATA 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 134 Position = 22 to 65 
TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATT 

A C1/C2 short loop on chromosome 2 whose identifier is 
10882 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene At2g2 6540 and has 
the DNA sequence 

Seq. Id. 135 Position = 1 to 56 

TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAA 
The match between the Tl sequence and the C1/C2 sequence 



TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAA 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

Seq . I d , = 135 Position ^ 28 to 56 
TACTAATTTAATTAATTAAATTTAATTAA 

10 



Example of a animal connectron - D. megalomaster 

15 A double stranded DNA loop of length 88.159 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 
whose identifier is 3340. This Tl control element has 
the DNA sequence 

20 Seq . Id . = 13 6 P osition 1 t o 132 

ACCTAAAAGAAGTACCGTTTTTTACTCCTAATTACCAATTCTAACCATCCATATCAC 
TTTTTGACGGACTCCGTGAAAATAATTTTTGGCCAAATTTTCGCATTTTTTGTAAGG 
GGTAACATCATAAAAATT 

25 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3372. This T2 
control element has the DNA sequence 

30 Seq. Id. - 137 Position = 1 to 136 



AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCCATATCACTTTTT 
GACGGACTCCGTGAAAATAATTTTTGGCCAAATTTTCGCATTTTTTGTAAGGGGTAA 
CATCATCAAAATTTGCGAAAAA 

5 This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

[Some of the following gene names have not been 
determined. ] 

10 

CG11207 
Orkl 

15 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 4 whose identifier is 
20 3362 controls the expression of the genes of one or more 
other T1/T2 long loops- This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene XXX and has the DNA sequence 

25 Seq. Id. - 138 Positio n 1 to 134 

AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCCATATCACTTTTT 
GACGGACTCCGTTAAAATAATTTTTGACCAAATTTTCGCATTTTTTGTAATCAAAAT 
TTGCAAAAAATTGAAAAAAC 

30 

A C1/C2 short loop on chromosome 4 whose identifier is 
3364 controls the expression of the genes of one or more 



CG2186 CG2157 



other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 ' UTR to the 
gene XXX and has the DNA sequence 

5 S ag. I d. ^ 1 3 9 Position 1 t o 8 3 

CAAAATTTGAATGCAAATCGATTGGGAATCAAAAAACAAACTCAACGAGGTATGACA 
TTCCATATTTGGGCCATTATTTCCAA 

10 A C1/C2 short loop on chromosome 4 whose identifier is 
3366 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 'UTR to the 
gene XXX and has the DNA sequence 

15 

Seq. Id. = 140 Position = 1 to 62 

TTTTTTCACAAAAATTAGGAAAATGATTTTGGGTAAAAAAATGAATATTTAAGTTGG 
GTTTT 

20 

A C1/C2 short loop on chromosome 4 whose identifier is 
3369 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 'UTR to the 
25 gene XXX and has the DNA sequence 

Seq. Id. = 141 Position ^ 1 to 87 

AAATCGATTGGGAATCAAAAAACAAACCTCAACGAGGTATGACATTCCATATCTGGG 
30 CCATTATTTCCAATCTTTTGATCAAAATAC 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 
5 3373 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene XXX and has the 
DNA sequence 

1 0 Sea. Id. 14 2 Pos ition ^ 1 to 136 

AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCCATATCACTTTTT 
GACGGACTCCGTGAAAATAATTTTTGGCCAAATTTTCGCATTTTTTGTAAGGGGTAA 
C AT C AT C AAAAT T T GO G AAAAA 

15 

The match between the Tl sequence and the C1/C2 sequence 
is 

Sea. Id. 142 Position ^ 15 to 120 

20 

TTTTACTCCTAATTACCAATTCTAACCATCCATATCACTTTTTGACGGACTCCGTGA 
AAATAATTTTTGGCCAAATTTTCGCATTTTTTGTAAGGGGTAACATCAT 

The match between the T2 sequence and the C1/C2 sequence 
25 is 

Seq. Id. ^ 142 Position = 1 to 136 

AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCCATATCACTTTTT 
30 GACGGACTCCGTGAAAATAATTTTTGGCCAAATTTTCGCATTTTTTGTAAGGGGTAA 
CAT CAT C AAAAT TT GC G AAAAA 



4- 



I 



Example of an animal connectron - H, sapiens 

5 All of the human genome that has been fully sequenced by 
both the NIH-lead global sequencing project and the 
Celera Genomics, Inc. project. The gene descriptors for 
this chromosome do not yet exist. Without the positions 
and directions of the genes, it is not possible to select 
10 from among the possible connectrons to determine the real 
connectrons . 

Human chromosome 22 has been processed and there 31,000 
possible connectrons . 

15 

The gene descriptors for all the chromosomes of the human 
genome should become available within the year. 



20 



6. Permanent: connectrons exxsl: in prokaryot:es , 
archea, single-celled eukaryotes and multi-celled 
eukaryotes . 

5 

C1/C2 short loops are normally expressed as the 3'UTR of 
some gene. A class of connectron relationships exist 
that permit one C1/C2 short loop to control the existence 
of one or more T1-T2 long loops without being subject to 
10 any expression controls other than those of the gene to 
which the C1/C2 is 3'UTR. These connectron relationships 
are described as "permanent". Permanent connectrons 
exist in prokaryotes, archea, single-sittgte-celled 
eukaryotes and multi-celled eukaryotes. 

15 

Example of a prokaryote permanent connectron - E. coli 

In this example the existence of the T1-T2 (3200-3210) 

long loop is controlled by a C1/C2 short loop (3432) . 

20 The expression of this C1/C2 short loop is controlled 
only by the gene btuB. 



34 32 Chromosome 1 
I 

25 * * ★ 

I Chromosome 1 | 

3200 3210 



A double stranded DNA loop of length 93.339 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3200. This Tl control element has 
the DNA sequence 



Sea. Id. = 143 Position = 1 to 378 



AAGCGGCACTGCTCTTTAACAATTTATCAGACAATCTGTGTGGGCACTCGAAGATAC 
5 GGATTCTTAACGTCGCAAGACGAAAAATGAATACCAAGTCTCAAGAGTGAACACGTA 
ATTCATTACGAAGTTTAATTCTTTGAGCATCAAACTTTTAAATTGAAGAGTTTGATC 
ATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAG 
GAAACAGCTTGCTGTTTCGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAAC 
TGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCA 
10 AGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATC 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3310. This T2 
control element has the DNA sequence 

15 

Seq. Id. - 144 Position = 1 to 378 

CAGACAATCTGTGTGGGCACTCGAAGATACGGATTCTTAACGTCGCAAGACGAAAAA 
TGAATACCAAGTCTCAAGAGTGAACACGTAATTCATTACGAAGTTTAATTCTTTGAG 
20 CGTCAAACTTTTAAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAG 
GCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAG 
TGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGG 
AAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCT 
CTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGT 

25 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



rrsC 
30 trpT 

ilvG_l 
ilvY 



gltU 
yif A 
ilvM 
ilvC 



rrlC 
yifE 
ilvE 
ppiC 



rrfC 
yifB 
ilvD 
b3776 



aspT 
ilvL 
ilvA 
rep 



gppA 


rhlB 


trxA 


rhoL 


rho 


rfe 


wzzE 


wecB 


rffH 


wecD 


wecE 


wzxE 


yifM_2 


wecG 


yifK 


argX 


hisR 


leuT 


proM 


aslB 


aslA 


hemY 


hemX 


hemD 


cyaA 


cyaY 


b3808 


dapF 


uvrD 


b3814 


corA 


yigF 


yigG 


rarD 


yigl 


pldA 


recQ 


yigj 


yigK 


pldB 


viaL 


viaM 


inetR 


metE 


vsaA 


udp 


yigN 


ubiE 


yigP 


b3836 


yigU 


yigW 1 


rfaH 


yigC 


ubiB 


fadA 


fadB 




pepQ 


trkH 


hemG 










The 


expression of genes 


in this 


T1/T2 long 


loop is 


controlled by the following C1/C2 short loops. 





A C1/C2 short loop on chromosome 1 whose identifier is 
3432 controls the expression of the genes in this T1/T2 
20 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene btuB and has the 
DNA sequence 

TGCGCGGTCAGA.\AATTATTTTAAATTTCCTCTTGTCAGGC C GGA.\TAACT 
25 CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGG 
GTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATG 
CTlGACTCTGrAGCGGGAAGGCGTATI A TGC A CAC C...TGCAACTCGACTC 
CATGAAGTCGGAATCGCTAGTAATCGTG GATCAG AATGCCACGGT GAAT A 
CGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGC 
30 A/\AAGAAGTAGGTAGCTTAACCTT CGGGAGG GCGCTTACCACTTTG T GAT 
TCATGACTGGGGTGAAGTCGTAACi\AGGT.AACCGTAGGGGAACCTGCGGT 



TGGATCACCTCCTTACCTTAAAGAAGCGT Seq . Id . 145 

„ Position === 1 to 520, 

AAGCGGCACTGCTCTTTAACAATTTATCAGACAATCTGTGTGGGCACTCGAAGATAC 

GGA TTCTTAAC GTCGCA AGA CG AAAAATGA A TACCAAGTCTCAAGAGTGAACACGTA 

ATTCATTACGAAGTTTAATT C TTTGAGCCAGACAATCTGTGTGGGCACTCGAAGATA 

CGGATTCTTAACGTCGCAAGACGAAAAATGAATACCAAGTCTCAAGAGTGAACACGT 

AATTCATTACGAAGTTTAATTCTTTGAGCGTCAAACTTTTAAATTGAAGAGTTTGAT 

CATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACA 

GGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAA 

CTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGC 

MGACCAAAGAGGG^^^ 

AGCTAGT 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. =^ 145 Position 1 to 142 

AAGCGGCACTGCTCTTTAACAATTTATCAGACAATCTGTGTGGGCACTCGAAGATAC 
GGATTCTTAACGTCGCAAGACGAAAAATGAATACCAAGTCTCAAGAGTGAACACGTA 
ATTCATTACGAAGTTTAATTCTTTGAGC 

The match between the T2 sequence and the C1/C2 sequence 
is 

Secf. Id . 145 P osit i on ^ 143 to 5 20 

CAGACAATCTGTGTGGGCACTCGAAGATACGGATTCTTAACGTCGCAAGACGAAAAA 
TGAATACCAAGTCTCAAGAGTGAACACGTAATTCATTACGAAGTTTAATTCTTTGAG 
CGTCAAACTTTTAAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAG 
GCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAG 



TGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGG 
AAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCT 
CTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGT 

5 

Example of an archea permanent connectron - H. pylori 

In this example the existence of the T1-T2 (812-882) 

10 long loop is controlled by a C1/C2 short loop (1241) . 

The expression of this C1/C2 short loop is controlled 
only by the gene HP1535. 

12 41 Chr omos ome 1 

15 I 

★ ★ ★ 

I Chromosome 1 

812 

20 

A double stranded DNA loop of length 96.385 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 812. This Tl control element has the 
25 DNA sequence 

Seq. Id. ^ 146 Position 1 to 43 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

30 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 882. This T2 
control element has the DNA sequence 

35 Seq, Id. = 1 47 Position ~ 1 .to 4_3 



I 

882 



4 



I 



TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 



This long T1/T2 double stranded DNA loop modulates the 
5 expression of the following genes 



20 
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The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

25 A C1/C2 short loop on chromosome 1 whose identifier is 
1241 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 * UTR to the gene HP1535 and has 
the DNA sequence 

30 

Seq. Id. = 148 Position = 1 to 56 



TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCATTCATCCCAAACA 

The match between the Tl sequence and the C1/C2 sequence 
is 

5 

Seq. Id. = 148 Pos i tion =^ 1 to 43 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

10 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id, = 148 Position = 28 to 56 
1 5 TAGCGGAACTAAAGCATTCATCCCAAACA 



Example of a single-celled permanent connectron - S. 
20 cervesiae 



In this example the existence of the T1-T2 (5515-5533) 

long loop is controlled by a C1/C2 short loop (6102). 

The expression of this C1/C2 short loop is controlled 
25 only by the gene YNL339C. 



6102 Chromosome 14 
I 

* * * 

30 I Chromosome 12 I 

5515 5533 



A double stranded DNA loop of length 6.4 66 kilo-bases on 
chromosome 12 is bounded on the left by a Tl sequence 
whose identifier is 5515. This Tl control element has 
the DNA sequence 

5 

Sag. Id. ^ 149 Position ^ 1 to 225 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
10 ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 5533. This T2 
15 control element has the DNA sequence 

Seq, Id. = 150 Position =^ 1 to 225 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
20 GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

This long T1/T2 double stranded DNA loop modulates the 
25 expression of the following genes 

YLR4 67W 

The expression of genes in this T1/T2 long loop is 
30 controlled by the following C1/C2 short loops. 



+ 



A C1/C2 short loop on chromosome 14 whose identifier is 
6102 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YNL339C and has 
5 the DNA sequence 

Seq. Id. ^ 151 Position 1 to 252 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
10 AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGGCT 

15 The match between the Tl sequence and the C1/C2 sequence 
is 

Sea. Id. = 151 Position ^ 1 to 225 

20 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTATATTGTA 
AGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTG 
ATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAA 
AGACAATCTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

25 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 151 Position ^ 28 to 252 

30 ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGAATATGC 
GTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 



GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAA 
ATAAAGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 



5 

Example of a multi-celled permanent connectron - C. 
elegans 

In this example the existence of the T1-T2 (5515-5533) 
10 long loop is controlled by a C1/C2 short loop (6102) ; 
The expression of this C1/C2 short loop is controlled 
only by the gene YNL339C. 

2 4442 Chromosome 5 
15 I 

•k Vr ★ 

I Chromosome 1 I 

569 596 

20 

A double stranded DNA loop of length 30.606 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 569. This Tl control element has the 
25 DNA sequence 

Seq. Id. - 152 Position = 1 to 239 

AAATCGAGCCCGTAAATCGACACAAGCGCTACAGTAGTC 

30 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 596. This T2 
control element has the DNA sequence 



35 Seq. I,d.^ ~ 1.53 Position - 1 to _4 2. 



AGTGCTACAGTAGTCATTTAAAGAATTACTGTAGTTTTCGCT 

The expression of genes in this T1/T2 long loop is 
5 controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 
24442 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
10 single strand that is 3 ' UTR to the gene F20D6 . 4 and has 
the DNA sequence 

Sea. Id. = 154 Position = 1 to 58 

15 GAGCCCGTAAATCGACACAAGCGCTACAGTAGTCATTTAAAGAATTACTGTAGTTTT 
C 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 

Seq..v_ Id.. = 15 4 Position - 1 to 3 4. 



GAGCCCGTAAATCGACACAAGCGCTACAGTAGTC 

25 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 154 Position - 23 to 58 
30 GCTACAGTAGTCATTTAAAGAATTACTGTAGTTTTC 



4- 



I 



7 . Transient: connecbrons exist: In prokaryo'bes , 
archea, single-celled eukaryot:es and mult:l-celled 
eukaryoties . 



A class of connectron relationships exist that permit one 
C1/C2 short loop to control the existence of one or more 
T1-T2 long loops such that this C1/C2 short loop is 
itself subject to expression control by another T1- T4-T2 
10 long loop which surrounds it. These connectron 

relationships are described as ^'transient". Transient 
connectrons exist ^in prokaryotes , archea, single-celled 
eukaryotes and multi-celled eukaryotes. 



15 Example of a prokaryote transient connectron - E. coli 



In this example the existence of the T1-T2 (3227-3329) 
long loop is controlled by the C1/C2 (3225) short loop. 
The expression of this C1/C2 short loop is controlled by 
20 the existence of the T1-T2 (3216-3224) long loop. The 
existence of this T1-T2 long loop is itself determined by 
the expression of the C1/C2 (3223) short loop. The C1/C2 
(3225) short loop is the transient connectron. 



25 3223 Chromosome 1 

I 

* ★ ★ 

I Chromosome 1 I 

3216 3324 
30 I 3225 I 



3225 Chromosome 1 
I 

35 * * 



3227 



Chromosome 1 



3329 



5 

A double stranded DNA loop of length 93.4 64 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3216. This Tl control element has 
the DNA sequence 

10 

Seq. Id. ^ 155 Position = 1 to 337 

AGCGCAAGCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGG 
TCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAAT 
15 GATGGCCAGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACTCGCTGTGAAGATGC 
AGTGTACCCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACTG 
AACATTGAGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGACGCCAGTC 
TGCATGGAGCCGACCTTGAAATACCACCCTTTAATGTTTGATGTTCTAACGT 

20 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3324. This T2 
control element has the DNA sequence 

Seq. Id. 156 Position 1 to 337 

25 

CCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGG 
GTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCCAGGCTGTCTCCACCCGAGAC 
TCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGAAAGAC 
CCCGTGAACCTTTACTATAGCTTGACACTGAACATTGAGCCTTGATGTGTAGGATAG 
30 GTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGGAGCCGACCTTGAAATACCAC 
CCTTTAATGTTTGATGTTCTAACGTTGACCCGTAATCCGGGTTGCGGACAGT 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 
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This long 


T1/T2 double 


stranded 


DNA loop modulates the 


20 


expression 


of the following C1/C2 


short loops 





A C1/C2 short loop on chromosome 1 whose identifier is 
3225 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
25 expressed as a RNA single strand that is 3'UTR to the 
gene rrlC and has the DNA sequence 

Sag. Id. 157 Position = 1 to 137 

30 AAACAGAATTTGCCTGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAAC 
TCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTA 
GGGAACTGCCAGGCATCAAATTA 



+ 



I 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 



5 A C1/C2 short loop on chromosome 1 whose identifier is 
3323 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 * UTR to the gene rrlA and has the 
DNA sequence 

10 

GCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGG 
TCCTA\GGIAGCGAAAriCC1TGTCGGGrAAGlTCC^ 

GCGTAATGATGGCCAGGCTGTCTCCACCCGAGACTCAGTG/\AATTG.\ACT 
CGCTGrGAAG/\TGCAGTGTACCCGCGGCAAGACGGA...AACAGAAm^ 
15 CTGGCGGCAGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAA 
GTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTC Sea, Id. - 158 

Position === 1 t o 362 



GCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAG 
20 GTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCC 
AGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTAC 
CCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACTGAACATTG 
AGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGG 
AGCCGACCTTGAAATACCACCCTTTAATGTTTGATGTTCTAACGTAACGTTGACCCG 
25 TAATCCGGGTTGCGGACAGT 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 Seq, Id. = 158 Positio n ^ 1 to 33 0 



GCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAG 
GTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCC 
AGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTAC 
CCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACTGAACATTG 
5 AGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGG 
AGCCGACCTTGAAATACCACCCTTTAATGTTTGATGTTCTAACGT 

The match between the T2 sequence and the C1/C2 sequence 
is 

10 

Seq. Id. = 158 Position - 21 to 362 

CCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGG 
GTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCCAGGCTGTCTCCACCCGAGAC 
15 TCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGAAAGAC 
CCCGTGAACCTTTACTATAGCTTGACACTGAACATTGAGCCTTGATGTGTAGGATAG 
GTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGGAGCCGACCTTGAAATACCAC 
CCTTTAATGTTTGATGTTCTAACGTTGACCCGTAATCCGGGTTGCGGACAGT 

20 

A double stranded DNA loop of length 93.749 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3227, This Tl control element has 
25 the DNA sequence 

Seq. Id. == 159 Position = 1 to 52 

AGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGGAACTGCCAGG 

30 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3329. This T2 
control element has the DNA sequence 

5 Se q . Id. 1 6 0 Positio n 1 t o 5 2 

CATGCGAGAGTAGGGAACTGCCAGGCATCAAATAAAACGAAAGGCTCAGTCG 
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T1/T2 double 
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expression 
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The expression of genes in this T1/T2 long loop is 
30 controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 
3225 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene rrlC and has the 
5 DNA sequence 

Seq, Id. ^ 16 1 Pos i tion ^ 1 to 137 

AAACAGAATTTGCCTGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAAC 
10 TCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTA 
GGGAACTGCCAGGCATCAAATTA 

The match between the Tl sequence and the C1/C2 sequence 
is 

15 

Seq. Id. - 161 Position - 76 to 127 

AGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGGAACTGCCAGG 

20 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id, ^ 161 Position = 103 to 135 
25 CATGCGAGAGTAGGGAACTGCCAGGCATCAAAT 



Example of an archea transient connectron - M. jannaschii 

30 

In this example the existence of the T1-T2 (1139-1159) 
long loop is controlled by the C1/C2 (533) short loop. 



The expression of this C1/C2 short loop is controlled by 
the existence of the T1-T2 (532-622) long loop. The 
existence of this T1-T2 long loop is itself determined by 
the expression of the C1/C2 (1629) short loop. The C1/C2 
5 (533) short loop is the transient connectron. ^ 



162 9 Chromosome 1 
I 

★ ★ * 

10 I Chromosome 1 | 

532 622 
I 533 I 



15 533 Chromosome 1 

I 



I Chromosome 1 | 

1139 1159 

20 



A double stranded DNA loop of length 78.672 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
25 whose identifier is 532 . This Tl control element has the 
DNA sequence 

Sa g. Id. = 162 Position === 1 to 33 

30 ATATGTTTGAAATTTGAAAATAAGAGTATTTAG 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 622. This T2 
control element has the DNA sequence 

35 

Seq. Id. = 163 Position = 1 to 47 



TTGAAAATAAGAGCATTTAGAAGTTATTAATTAGTTCAAAGGATTTT 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

5 



20 
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This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

25 A C1/C2 short loop on chromosome 1 whose identifier is 
533 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene MJ0485 and has the DNA sequence 



Seq. Id, = 164 Position = 1 to 64 



ATTTTTATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTA 
TTGAATT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
1629 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene MJ1597 and has 
the DNA sequence 

Seq. Id. = 165 Position = 1 to 139 

ATATGTTTGAAATTTGAAAATAAGAGTATTTAGAAGTTATTAATTAGTTCAAAGGAT 
TTTTATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTTTATT 
G AAT T AT T C AG AT T T T T AA AAAT T A 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. 165 Position ^ 1 to 33 
ATATGTTTGAAATTTGAAAATAAGAGTATTTAG 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 165 Position = 33 to 60 



ATTTAGAAGTTATTAATTAGTTCAAAGGATTTT 



A double stranded DNA loop of length 14.509 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
5 whose identifier is 1139. This Tl control element has 
the DNA sequence 

Se q. Id. =^ 166 Position === 1 to 78 

10 ATTTATTAATTAGTTCAAAGGATTTTTATTTAATTTCTAAGGGTTAGCTGGTTTGAT 
TGTTTAAAATATTTGAGTTTA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1159. This T2 
15 control element has the DNA sequence 

Seq. Id. = 167 Position - 1 to 78 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
20 TATTCAGATTTTTAAAAATTA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

25 MJ10 9 6 MJ10 97 tRNA-Arg-3 MJ10 98 MJ10 99 

MJllOO MJllOl MJ1102 MJ1103 MJ1104 

MJ1105 MJ110 6 MJ1107 MJ1108 



The expression of genes in this T1/T2 long loop is 
30 controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 
533 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3*UTR to the gene MJ0485 and has 
5 the DNA sequence 

Seq. Id . ^ 168 Posit i on 1 to 6 4 

ATTTTTATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTA 
10 TTGAATT 

The match between the Tl sequence and the C1/C2 sequence 
is 

15 Seq. Id, = 168 Position ^ 1 to 37 

ATTTTTATTTAATTTCTAAGGGTTAGCTGGTTTGATT 

The match between the T2 sequence and the C1/C2 sequence 
20 is 

Seq. Id. = 168 Position ^ 7 to 64 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTGAAT 
25 T 



30 



Example of a single-celled transient connectron - 
cervesiae 



S. 



In this example the existence of the T1-T2 (2840-2859) 
long loop is controlled by the C1/C2 (298) short loop. 
The expression of this C1/C2 short loop is controlled by 
the existence of the T1-T2 (293-320) long loop. The 
5 existence of this T1-T2 long loop is itself determined by 
the expression of the C1/C2 (86) short loop. The C1/C2 
(298) short loop is the transient connectron. 



10 

8 6 Chromosome 1 
I 

★ * * 

I Chromosome 1 | 

15 293 320 
I 298 I 



2 98 Chromosome 1 
20 I 

* ★ ★ 

I Chromosome 7 | 

2840 2859 

25 



A double stranded DNA loop of length 38.470 kilo-bases on 
chromosome 2 is bounded on the left by a Tl sequence 
whose identifier is 293. This Tl control element has the 
30 DNA sequence 



Seq. Id. = 169 Position - 1 to 258 

GAATTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAATA 
35 TATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTGTCATCGAAG 
TTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAGGATAATGAAACATATAAA 



4- 



ACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 
ATTCCTATATCCTTGAGGAGAACTTCTAGT 



This double stranded DNA loop is bounded on the right by 
5 a T2 control element whose identifier is 320. This T2 
control element has the DNA sequence 

Se q. Id, ^ 170 Posi t ion ^ 1 to 70 

10 AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
AACTTCTAGTATATTCTGTA 



15 



20 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



YBL005W-B 
YBLOOIC 
YBR005W 
YBROlOW 



TS (AGA) B 
YBROOIC 
YBR006W 
YBROllC 



YBL004W 
YBR002C 
YBR007C 
YBR012C 



YBL003C 
YBR003W 
YBR008C 



YBL002W 
YBR004C 
YBR009C 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



A C1/C2 short loop on chromosome 2 whose identifier is 
25 298 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene YBL0 05W-B and has the DNA sequence 



30 Seq. Id. = 171 Position ^ 1 to 342 



ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTATCAACTA 
ATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTAT 
GAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAG 
GATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAAT 
5 ATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAACTTCTAGTATATTCTGT 
ATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

10 

A C1/C2 short loop on chromosome 1 whose identifier is 86 
controls the expression of the genes in this T1/T2 long 
loop. This C1/C2 short loop is expressed as a RNA single 
strand that is 3'UTR to the gene YAR009C and has the DNA 
15 sequence 

Seq, Id. - 172 Position = 1 to 362 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTACTAACTA 
20 GTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATGATG 
AGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGG 
ATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTA 
GAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATAT 
TCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAA 
25 CATTCACCCATTTCTCAGAA 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 Seq. Id. - 172 Position = 184 to 264 



AAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATT 
CCATTTTGAGGATTCCTATATCCT 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

Se q. Id. - 172 P o sit io n = 215 to 291 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
10 AACTTCTAGTATATTCTGTA 



A double stranded DNA loop of length 5.302 kilo-bases on 
15 chromosome 7 is bounded on the left by a Tl sequence 
whose identifier is 2840. This Tl control element has 
the DNA sequence 

Seq. Id. ^ 173 Position === 1 to 313 

20 

TCTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAATATA 
TTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTGTCATCGAAGTT 
AGAGGAAGCTGAAACGCAAGGATTGATAATGTAATAGGATCAATGAATATAAACATA 
TAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTT 
25 GAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAAATTATA 
GCCTTTATCAACAATGGAATCCCAACAA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 2859. This T2 
30 control element has the DNA sequence 

Seq. Id. = 174 Position = 1 to 314 



CTATCAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGA 
CATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAACGCAAGGATTGAT 
AATGTAATAGGATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAAT 
5 ATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAGAAC 
TTCTAGTATATTCTGTATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAAC 
AATTATCTCAACATTCACATATTTCTCAT 

The expression of genes in this T1/T2 long loop is 
10 controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 2 whose identifier is 
298 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
15 single strand that is 3 • UTR to the gene YBL005W-B and has 
the DNA sequence 

Sea. Id, ^ 175 Position = 1 to 342 

20 ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTATCAACTA 
ATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTAT 
GAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAG 
GATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAAT 
ATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAACTTCTAGTATATTCTGT 

25 ATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 Seq. Id. = 175 Position = 23 to 147 



TGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAATATATT 
ATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTGTCATCGAAGTTAG 

AGGAAGCTGAA 

5 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. ^ 175 Position - 48 to 1 46 

10 CTATCAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGA 
CATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAA 



15 Example of a multi-celled transient connectron - C. 
elegans 

In this example the existence of the T1-T2 (22072-22108) 
long loop is controlled by the C1/C2 (125) short loop. 
20 The expression of this C1/C2 short loop is controlled by 
the existence of the T1-T2 (110-129) long loop. The 
existence of this T1-T2 long loop is itself determined by 
the expression of the C1/C2 (16859) short loop. The 
C1/C2 (125) short loop is the transient connectron. 

25 

16859 Chromosome 4 



Chromosome 1 I 

129 

125 I 

125 Chromosome 1 

35 . I 

★ * * 



* 

I 

30 110 

I 



I 



22072 



Chromosome 5 



22108 



5 

A double stranded DNA loop of length 18,855 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 110. This Tl control element has the 
DNA sequence 

10 

Seq. Id. ^ 176 Position ^ 1 to 33 

AGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGC 

15 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 129. This T2 
control element has the DNA sequence 

Seq. Id. 177 Position - 1 to 2123 

20 

TTCTCCCGCATTTTTTGTAGATCTACGTAGATCAAACCGAAATGAGGCACTTTCTGA 
ATCCACGAGCTAGGCTTAAGCTTAGGCTTAAGCTTAGGCCTTTTCTCAGGCTTAGGC 
TTAGGCTTA 

25 This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

ZC123,3 ZC123.2 

30 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 
125 controls the expression of the genes of one or more 



other T1/T2 long loops. This C1/C2 short loop is 

expressed as a RNA single strand that is 3'UTR to the 
gene 2C123.3 and has the DNA sequence 

5 Se q . I d, = 178 Position =^ 1 to 8 9 

ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTCGGCAAAC 
TCTTTCATTTCAATTTATGAGGGAAGCCAGAA 

10 The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 
16859 controls the expression of the genes in this T1/T2 
15 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene F58E2.7 and has 
the DNA sequence 

Seq. Id. ^ 179 Position 1 to 166 

20 

CTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTT 
AAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGG 
CTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGACTTA 

25 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. = 179 Position ^ 11 to 43 
30 AGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGC 



The match between the T2 sequence and the C1/C2 sequence 
is 

Sea. Id. - 179 Position ^ 3 to 33 

5 

TAGGCTTAAGCTTAGGCTTAAGCTTAGGC 



10 A double stranded DNA loop of length 51.031 kilo-bases on 
chromosome 5 is bounded on the left by a Tl sequence 
whose identifier is 22072. This Tl control element has 
the DNA sequence 

15 Seq. Id, = 180 Position = 1 to 57 

CGCAACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGACCTAGTTCGGC 

This double stranded DNA loop is bounded on the right by 
20 a T2 control element whose identifier is 22108. This T2 
control element has the DNA sequence 

Seq. Id. - 181 Position 1 to 170 

25 TGACAATCGCCTGCCGGACAACGCGTGGAAAAGTGTCGTGTACTCCACACGGACAAA 
TACATTTAGTTTTACAACTAAAATCGAACCGCGACGCGACACGCAACGCGACGTAAA 
TCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTCGGCAAACTCTTCTATTTC 

This long T1/T2 double stranded DNA loop modulates the 
30 expression of the following genes 



F36H9.3 F36H9.4 F36H9.5 F36H9.2 F36H9.1 

F36H9. 6 

The expression of genes in this T1/T2 long loop is 
5 controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 
125 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
10 single strand that is 3 ' UTR to the gene ZC123.3 and has 
the DNA sequence 

Seq. Id. = 182 Position = 1 to 89 

15 ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTCGGCAAAC 
TCTTTCATTTCAATTTATGAGGGAAGCCAGAA 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 

S eq . Id., - 1.82 Position - 1 to 4.1 

ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATG 

25 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 182 Position = 7 to 61 



30 CGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTCGGCAAACTCTT 



8 . Sel£-lxni±t:±ng connectrons occur in 

prokairyotes , archea, single-celled eukaryotes and 
mult:i-celled eukaryo'tes 

A class of connectron relationships exist that permit one 
C1/C2 short loop to control the existence of the T1-T2 
long loop that surrounds it. These connectron 

relationships are described as '"self-limiting". Self- Setf^ 
limiting connectrons exist in prokaryotes, archea, 
single-celled eukaryotes and multi-celled eukaryotes. 

Example of a prokaryotic self-limiting connectrons - E. 
coli 

In this example the existence of the T1-T2 (1704-1718) 
long loop is controlled by two C1/C2 (1705 and 1713) 
short loops. The expression of these C1/C2 short loops 
is controlled by the existence of the T1-T2 f 1704- ( 170 4- 
1718) long loop. The existence of this T1-T2 long loop 
is itself determined by the expression of the two C1/C2 
(1705 and 1713) short loops. The C1/C2 (1705 and 1713) 
short loops are the self -limiting connectrons. 

17 05 Chromosome 1 
1713 Chromosome 1 
I 

★ * ★ 

Chromosome 1 | 

1718 

1705 1713 I 



1704 
I 



A double stranded DNA loop of length 15.259 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 



whose identifier is 1704 . This Tl control element has 
the DNA sequence 



Seq. Id. ^ 183 Position =^ 1 to 71 

5 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATC 
CGTATGTCACTGGT 



This double stranded DNA loop is bounded on the right by 
10 a T2 control element whose identifier is 1718. This T2 
control element has the DNA sequence 

Se q. Id, = 184 Position = 1 to 71 

15 TTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCA 
GAGGAGCCAAATTC 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

20 

asnT bl978 bl979 bl980 shiA 

amn bl983 asnW yeeO 

asnU 

25 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



A C1/C2 short loop on chromosome 1 whose identifier is 
1705 controls the expression of the genes of one or more 
30 other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 ' UTR to the 
gene and has the DNA sequence 



\ 



Seq. Id. ^ 185 Position = 1 to 98 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATC 
5 CGTATGTCACTGGTTCGAGTCCAGTCAGAGGAGCCAAATTC 

A C1/C2 short loop on chromosome 1 whose identifier is 
1713 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 

10 expressed as a RNA single strand that is 3 ' UTR to the 
gene asnW and has the DNA sequence 

Seq. Id. = 186 Position ^ 1 to 86 

15 CACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACT 
GGTTCGAGTCCAGTCAGAGGAGCCAAATT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

20 

A C1/C2 short loop on chromosome 1 whose identifier is 
1705 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 'UTR to the gene and has the DNA 
25 sequence 

Seq. Id. = 187 Position 1 to 98 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATC 
30 CGTATGTCACTGGTTCGAGTCCAGTCAGAGGAGCCAAATTC 



+ 



I 



The match between the Tl sequence and the C1/C2 sequence 
is 

Sea. Id. = 187 Position - 1 to 71 

5 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATC 
CGTATGTCACTGGT 

The match between the T2 sequence and the C1/C2 sequence 
10 is 

Seq. Id. = 187 Position - 28 to 98 

TTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCA 
15 GAGGAGCCAAATTC 

A C1/C2 short loop on chromosome 1 whose identifier is 
1713 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
20 single strand that is 3'UTR to the gene asnW and has the 
DNA sequence 

Seq. Id, 188 Position === 1 to 86 

25 CACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACT 
GGTTCGAGTCCAGTCAGAGGAGCCAAATT 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 

Seq. Id. = 188 Position = 1 to 60 



CACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACT 



The match between the T2 sequence and the C1/C2 sequence 
5 is 

Se q. Id. ^ 188 Position = 17 to 86 

TTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCA 
10 GAGGAGCCAAATT 



Example of a archea self-limiting connectrons - M. 
15 jannaschii 

In this example the existence of the T1-T2 (1447-1471) 
long loop is controlled by two C1/C2 (1448 and 1470) 
short loops. The expression of these C1/C2 short loops 
20 is controlled by the existence of the T1-T2 {144 7- <;i117 
1471) long loop. The existence of this T1-T2 long loop 
is itself determined by the expression of the two C1/C2 
(1705 and 1713) short loops. The C1/C2 (1448 and 1470) 
short loops are the self-limiting connectrons. 



GGT 



25 



1448 Chromosome 1 
147 0 Chromosome 1 



30 



Chromosome 1 



1447 



1471 



1448 



1470 



35 



A double stranded DNA loop of length 22.675 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 1447 . This Tl control element has 
the DNA sequence 



Seq. I d. - 189 Posit i on = 1 to 95 



TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
CTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATA 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1471. This T2 
control element has the DNA sequence 



Seq. Id. = 190 Position = 1 to 95 



CAACTAACAACCGTATCGAATTTACCATTACTTGGAAATCTATTTAAAACCTCTTTA 
ATCTTGTGATAATAAATTCTAATCGATTCGTGACTTAT 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 



MJ1402 
MJ1407 
MJ1412 
MJ1417 



MJ1403 
MJ1408 
MJ1413 
MJ1418 



MJ1404 
MJ1409 
MJ1414 
MJ1419 



MJ1405 
MJ1410 
MJ1415 
MJ1420 



MJ1406 
MJ1411 
MJ1416 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 
1448 controls the expression of the genes of one or more 



other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the 
gene MJ14 01 and has the DNA sequence 

5 Seq . Id . ~ 1 91 Posi ti o n = 1 t o 12 2 

TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
CTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATAATAAATTCTAATCGATTCG 
TGACTTAT 

10 

A C1/C2 short loop on chromosome 1 whose identifier is 
1470 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3*UTR to the 
15 gene MJ1420 and has the DNA sequence 

Seq. Id, ^ 192 Position - 1 to 116 

TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
20 CTTGGAAATCTATTTAAAACCTCTTTAATCTTGTGATAATAAATTCTAATCGATTCG 
TG 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

25 

A C1/C2 short loop on chromosome 1 whose identifier is 
1470 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene MJ1420 and has 
30 the DNA sequence 

Seq, Id. = 193 Position = 1 to 116 



TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
CTTGGAAATCTATTTAAAACCTCTTTAATCTTGTGATAATAAATTCTAATCGATTCG 
TG 

5 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. = 193 Position - 1 to 89 

10 

TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
CTTGGAAATCTATTTAAAACCTCTTTAATCTT 

The match between the T2 sequence and the C1/C2 sequence 
15 is 

Seq. Id. = 193 Position = 28 to 116 

CAACTAACAACCGTATCGAATTTACCATTACTTGGAAATCTATTTAAAACCTCTTTA 
20 ATCTTGTGATAATAAATTCTAATCGATTCGTG 

A C1/C2 short loop on chromosome 1 whose identifier is 
1448 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
25 single strand that is 3 * UTR to the gene MJ1401 and has 
the DNA sequence 

Seq. Id. = 194 Position = 1 to 122 

30 TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
CTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATAATAAATTCTAATCGATTCG 
TGACTTAT 



The match between the Tl sequence and the C1/C2 sequence 
is 

5 S eq, Id, 194 Pos i tion _1 to 95 

TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTACCATTA 
CTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATA 

10 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id, 194 Position = 29 to 99 

15 CAACTAACAACCGTATCGAATTTACCATTACTTGGAAATCTATTTAAAACCTCTTTA 
ATCTT 



20 Example of a single-celled self-limiting connectron - S. 
cervesiae 

In this example the existence of the T1-T2 (293-320) 
long loop is controlled by C1/C2 (298) short loop. The 
25 expression of this C1/C2 short loop is controlled by the 
existence of the T1-T2 (293-320) long loop. The 
existence of this T1-T2 long loop is itself determined by 
the expression of the C1/C2 (298) short loop. The C1/C2 
(298) short loop is the self-limiting connectron. 

30 

2 98 Chromosome 2 
I 

★ * -k 



I Chromosome 2 | 

293 320 
I 298 I 



A double stranded DNA loop of length 38.470 kilo-bases on 
chromosome 2 is bounded on the left by a Tl sequence 
whose identifier is 293. This Tl control element has the 
DNA sequence 

Seq. Id . ^ 195 P os i t i on ^ 1 to 258 

GAATTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAATA 
TATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTGTCATCGAAG 
TTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAGGATAATGAAACATATAAA 
ACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 
ATTCCTATATCCTTGAGGAGAACTTCTAGT 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 320. This T2 
control element has the DNA sequence 

Seq, Id. ^ 196 Position - 1 to 77 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
AACTTCTAGTATATTCTGTA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

YBL005W-B TS(AGA)B YBL004W YBL003C YBL002W 

YBLOOIC YBROOIC YBR0 02C YBR003W YBR004C 



YBR005W YBR006W YBR007C YBR008C YBR009C 

YBROlOW YBROllC YBR012C 



This long T1/T2 double stranded DNA loop modulates the 
5 expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 2 whose identifier is 
298 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop is 
10 expressed as a RNA single strand that is 3'UTR to the 
gene YBL005W-B and has the DNA sequence 

Seq. Id. = 5197 Position = 1 to 342 

15 ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTATCAACTA 
ATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTAT 
GAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAG 
GATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAAT 
ATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAACTTCTAGTATATTCTGT 

20 ATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

25 A C1/C2 short loop on chromosome 2 whose identifier is 
298 controls the expression of the genes in this T1/T2 

long loop. This C1/C2 short loop is expressed as a RNA 

single strand that is 3'UTR to the gene YBL005W-B and has 
the DNA sequence 

30 

Seq. Id. 198 Position ^ 1 to 342 



ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTATCAACTA 
ATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTAT 
GAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAG 
GATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAAT 
5 ATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAACTTCTAGTATATTCTGT 
ATACCTAATATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

The match between the Tl sequence and the C1/C2 sequence 
is 

10 

Seq. Id. = 198 Position ^ 23 to 276 

TGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAATATATT 
ATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTGTCATCGAAGTTAG 
15 AGGAAGCTGAAGTGCAAGGATTGATAATGTAATAGGATAATGAAACATATAAAACGG 
AATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTC 
CTATATCCTTGAGGAGAACTTCTAGT 

The match between the T2 sequence and the C1/C2 sequence 
20 is 

Seq. Id. = 198 Posi t ion ^ 210 to 259 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCT 

25 



Example of a multi-celled self-limiting connectron - C. 
elegans 

30 

In this example the existence of the T1-T2 (293-320) 
long loop is controlled by C1/C2 (298) short loop. The 



expression of this C1/C2 short loop is controlled by the 
existence of the T1-T2 (293-320) long loop. The 
existence of this T1-T2 long loop is itself determined by 
the expression of the C1/C2 (298) short loop. The C1/C2 
5 (298) short loop is the self -limiting connectron. 

17155 Chromosome 4 
I 

★ ★ ★ 

10 I Chromosome 4 | 

17154 17190 
I 17155 I 



15 

A double stranded DNA loop of length 89.919 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 
whose identifier is 17154. This Tl control element has 
the DNA sequence 

20 

Seq, Id. = 199 Position = 1 to 29 

AAATTTCCGGCAAATCGGCAAACTGGCAA 

25 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 17190. This T2 
control element has the DNA sequence 

Seq. Id. === 200 Position - 1 t o 29 

30 

AATTTGCCGATTTGCCGAATTTGTCGACA 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following genes 

35 



R08C7 . 11 



M01H9.2 



M01H9.3 



M01H9. 4 



M01H9. 1 



ZK180 . 1 



ZK180 .2 



ZK180.3 



ZK180. 4 



ZK180 . 5 



ZK180. 6 



ZK185 . 3 



ZK185 . 2 



5 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 4 whose identifier is 
17155 controls the expression of the genes of one or more 



expressed as a RNA single strand that is 3 ' UTR to the 
gene R08C7.1 and has the DNA sequence 

Seq. Id. = 201 Position = 1 to 56 

15 

AAATTTCCGGCAAATCGGCAAACTGGCAATTTGCCGATTTGCCGAATTTGTCGACA 

A C1/C2 short loop on chromosome 4 whose identifier is 
17171 controls the expression of the genes of one or more 
20 other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 * UTR to the 
gene ZK180.2 and has the DNA sequence 

Seq. Id. 202 Position = 1 to 56 

25 

TGGAAATTTCAGAATTTCAATTTTAATCGGCAAAATTGTACGCATCCTATGAATTT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 



10 other T1/T2 long loops. 



This C1/C2 short loop is 



30 



A C1/C2 short loop on chromosome 4 whose identifier is 
17155 controls the expression of the genes in this T1/T2 



long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene R08C7.1 and has 
the DNA sequence 

5 Seq . I d . ^ 2 03 P osit ion = „1 to 5 6 

AAATTTCCGGCAAATCGGCAAACTGGCAATTTGCCGATTTGCCGAATTTGTCGACA 

The match between the Tl sequence and the C1/C2 sequence 
10 is 

Seq. Id. 203 Position = 1 to 29 
AAATTTCCGGCAAATCGGCAAACTGGCAA 

15 

The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. =^ 203 Position 28 to 56 

20 

AATTTGCCGATTTGCCGAATTTGTCGACA 



9. Geneless connectrons exist in single-celled 
and multi-celled eukaryotes 



Normally T1-T2 long loops contain genes whose expression 
is regulated by the existence of the long loop. When a 
T1-T2 long loop does not contain any genes it is 
described as being ''geneless". The existence of the Tl- 
T4-T2 long loop is itself controlled by one or more C1/C2 
short loops that may be on the same or different 
chromosomes. The geneless T1-T2 long loops must contain 
one or more C1/C2 short loops. 

Example of a single-celled geneless connectron - S. 
cervesiae 

In this example the existence of the T1-T2 (1537-1559) 
long loop is controlled by three C1/C2 (3789, 5289 and 
5753) short loops . The expression of 21 C1/C2 (1538 
through 1558) short loops are controlled by the existence 
of the T1-T2 (1537-1559) long loop. 

37 8 9 Chromosome 9 
52 8 9 Chromosome 12 
5753 Chromosome 13 



Chromosome 4 | 

1559 

1538 through 1558 | 



A double stranded DNA loop of length 4.825 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 



1537 
I 



whose identifier is 1537. This Tl control element has 
the DNA sequence 



Seq. Id. = 204 Position 1 to 362 

5 

ATGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAAAGGCTA 
TAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATA 
AAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATG 
TTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATT 
10 TAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATAC 
TAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACATACCACC 
CATAATGTAATAGATCTAAT 

This double stranded DNA loop is bounded on the right by 
15 a T2 control element whose identifier is 1559. This T2 
control element has the DNA sequence 

Sag. Id. ^ 205 Position ^ 1 to 362 

20 ATGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAAAGGCTA 
TAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATA 
AAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATG 
TTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATT 
TAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATAC 

25 TAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACATACCACC 
CATAATGTAATAGATCTAAT 

There are no genes controlled by this T1/T2 loop. 



30 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



A C1/C2 short loop on chromosome 4 whose identifier is 
1538 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

5 

Seq. Id. ^ 206 Po sition - 1 to 387 

ATGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAAAGGCTA 
TAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATA 
AAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATG 
TTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATT 
TAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATAC 
TAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACATACCACC 
CATAATGTAATAGATCTAATGAATCCATTTGTTTGTTAATAGTTT 

This T1-T2 loop also modulates the C1/C2 short loops 
numbered 1539 to 1557 

A C1/C2 short loop on chromosome 4 whose identifier is 
20 1558 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq. Id. 207 Posit i on 1 to 307 

25 

AGCTTCTCATAACTTATGTCATCATCTTAACACCGTATATGATAATATATTGATAAT 
ATAACTTGTTGGAATAAAAATCAACTATCATCTACTAACTAGTATTTACGTTACTAG 
TATATTATCATATACGGTGTTAGAAGATGACGCAAATGATGAGAAATAGTCATCTAA 
ATTAGTGGAAGCTGA . . . GTCTATCTGGCGAATATAAATTTTTACGCTACACACGTC 
30 ATCGACATCTAAATATGACAGTCGCTGAACTGTTCTTAGATATCCATGCTATTTATG 
AAGAACAACAGGGATCGAGAAACAG 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 9 whose identifier is 
5 3789 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 ' UTR to the gene YIL059C and has 
the DNA sequence 

1 0 Seq. Id, ^ 208 Posit ion = 1 to 1 76 

TTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTC 
CACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGA 
TAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAAC 
15 AGTAT 

The match between the Tl sequence and the C1/C2 sequence 
is 

20 Seq. Id . = 208 P ositi on = 1 to 172 

TTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTC 
CACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGA 
TAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAAC 
25 A 

The match between the T2 sequence and the C1/C2 sequence 
is 

30 Seq. Id. = 208 Position = 1 to 172 



4- 



TTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTC 
CACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGA 
TAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAAC 
A 

5 

A C1/C2 short loop on chromosome 12 whose identifier is 
5289 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3*UTR to the gene YLR301W and has 
10 the DNA sequence 

Seq. Id. ^209 Position ^ 1 to 325 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGT 
15 ATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCT 
GCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCAT 
TGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTAT 
TTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAA 
ATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACAC 

20 

The match between the Tl sequence and the C1/C2 sequence 
is 

Se q. Id. ^209 Position 62 to 317 

25 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAA 
TTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCATTGAT 
CCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCT 
CATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATAC 
30 TAGTTAGTAGATGATAGTTGATTTTTATTCCAACA 



The match between the T2 sequence and the C1/C2 sequence 
is 

Seq, Id. =209 Position ^ 86 to 324 

5 

AGGATTTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTA 
TTATCATCGTTTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCG 
TTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACA 
CCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTT 
10 TTATTCCAACA 

A C1/C2 short loop on chromosome 13 whose identifier is 
5753 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
15 single strand that is 3 ' UTR to the gene YMR044W and has 
the DNA sequence 

Seq. Id. 210 Position = 1 to 334 

20 TTGAGAAATGGGGGAATGTTGAGATAATTGTTGGGATTCCATTGTTGATAAAGGCTA 
TAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCAAGGATATAGGAATCCTCA 
AAATGGAATCTATATTTCTACATACTAATATTACGATTATTCCTCATTCCGTTTTAT 
ATGTTTCATTATCCTATTACATTATCAATCCTTGCACTTCAGCTTCCTCTAACTTCG 
ATGACAGCTTCTCATAACTTATGTCATCATCTTAACACCGTATATGATAATATATTG 

25 ATAATATAACTATTAGTTGATAGACGATAGTGGATTTTTATTCCAACAT 

The match between the Tl sequence and the C1/C2 sequence 
is 

30 Seq. Id. = 210 Position 22 to 95 



4- 



AGATAATTGTTGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATACAGAATA 
TACTAGAAGTTCTCCTC 

The match between the T2 sequence and the C1/C2 sequence 
5 is 

Seq. Id. === 210 Position ^ 28 to 101 

TTGTTGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATACAGAATATACTAG 
10 AAGTTCTCCTCAAGGAT 



Two examples of multi-celled geneless connectrons - C. 
15 elegans 



In the first example the existence of the T1-T2 (2342- 
2344) long loop is controlled by the C1/C2 (24114) short 
loop. The expression of one C1/C2 (2343) short loop is 
20 controlled by the existence of the T1-T2 (2342- a312 2344) 
long loop. 



2 4114 Chromosome 5 
I 

25 * * * 

I Chromosome 1 I 

2342 2344 
I 2343 I 

30 



35 



In the second example the existence of the T1-T2 (29221- 
29262) long loop is controlled by the C1/C2 (24114) short 
loop. The expression of one C1/C2 (2343) short loop is 



controlled by the existence of the T1-T2 { 2 34 2 - (23 4 2 ■2 34 4 ) 
long loop. 



42 91 Chromosome 1 
5 I 

★ ★ 

I Chromosome 5 | 

29221 29262 

I 29222 through 29261 | 

10 



A double stranded DNA loop of length 67.059 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
15 whose identifier is 2342. This Tl control element has 
the DNA sequence 



Seq. Id, = 211 Position = 1 to 37 



20 TGAAAACTACAGTAATTCTTTAAATGACTACTGTAGC 



This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 2344. This T2 
control element has the DNA sequence 

25 

Seq. Id. ^ 212 Position = 1 to 37 



CTACTGTAGCGCTTGTGTCGATTTACGGGCTCGATTT 



30 There are no genes controlled by this T1/T2 loop. 



This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



A C1/C2 short loop on chromosome 1 whose identifier is 
2343 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

5 

Seq. Id. ^ 213 Position 1 t o 61 

TCGACACAAGCGCTACAGTAGCTATTTAAAGAATTACTGTAGTTTTCGCTACGAGAT 
ATTT 

10 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 
15 24114 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene C13F10.5 and has 
the DNA sequence 

20 Seq. Id. = 214 P os ition _~ 1 to 68 

GCGAAAACTACAGTAATTCTTTAAATGACTACTGTAGCGCTTGTGTCGATTTACGGG 
CTCGATTTTCG 

25 The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. -= 214 Position ^ 3 to 38 



30 



GAAAACTACAGTAATTCTTTAAATGACTACTGTAGC 



The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. - 214 Position ^ 29 to 65 

5 

CTACTGTAGCGCTTGTGTCGATTTACGGGCTCGATTT 



10 A double stranded DNA loop of length 41.297 kilo-bases on 
chromosome 5 is bounded on the left by a Tl sequence 
whose identifier is 29221. This Tl control element has 
the DNA sequence 

15 Seq, Id. = 215 Position = 1 to 62 

TTTAAATTTCCCGCCAAAAATTGACTGAAAATTTGGATTTTCTTTCCAAAAATTGAC 
AGAAA 

20 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 29262. This T2 
control element has the DNA sequence 

Seq. Id. = 216 Position = 1 to 31 

25 

TGAAAATTTGAATTTCCCGCCAAAAATTAAC 

There are no genes controlled by this T1/T2 loop. 

30 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



I 



A C1/C2 short loop on chromosome 5 whose identifier is 
29222 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

5 

Sag. Id. 217 Position ^ 1 to 58 

AATTTCCCGCCAAAAATTGACTGAAAATTTGGATTTTCTTTCCAAAAATTGACAGAA 
A 

10 

This T1-T2 loop also modulates the C1/C2 short loops 
numbered 29223 to 29260 

A C1/C2 short loop on chromosome 5 whose identifier is 
15 29261 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq. Id, ^ 218 Position ^ 1 to 54 

20 

AAAATTGACTGAAAATTTGAATTTCCAGCCAAAAATTGACTGAAAATTTGAATT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

25 

A C1/C2 short loop on chromosome 1 whose identifier is 
4291 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene Y43F8C.5 and has 
30 the DNA sequence 

Sag. Id. = 219 Position =^ 1 to 317 



AAAATTAACTGAAAATTTGAATTTCCCGCCAAAAATTGACTGAAAATTTGAATTTCC 
CGCCAAAAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTGACTGAAAATTT 
GAATTTCCCGCCAAAAATTAATTGAAAATTTGAATTTCCCGCCAAAAATTAATTGAA 
5 ACTTTGAATTTTCAA. . . ATTTCCCGCCAAAAATTAATTGAAACTTTGAATTTTCAA 
ATTTCCCGCCAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTAATTGAAAA 
TTTGAATTTTTGAATTTCCCGCCAAAAATGACTGA 

The match between the Tl sequence and the C1/C2 sequence 
10 is 

TTTAAATTTCCCGCCAAAAATTGACTGAAAATTTG Se„^^^^^ Id,. - 219 

Posi tion = 229 to 260 

1 5 AAATTTCCCGCCAAAAA TT GACTGAAAATTTG 

The match between the T2 sequence and the C1/C2 sequence 
is 

20 Seq. Id. = 219 Position - 63 to 104 

AAAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTGA 



25 



10. One connectron con-brols many geneless 
connect:rons xn single-celled and mul'bl-celled 
eukaryotes 

5 

One C1/C2 short loop can control the existence of many 
geneless T1-T2 long loops. 

Example of a single-celled geneless connectron - S. 
10 cervesiae 

In this example the existence of the three T1-T2 (1142- 
1156, 1242-1272 and 7102-7117) long loops is controlled 
by the C1/C2 (5289) short loop. 

15 

52 8 9 Chromosome 12 
I 

* ★ ★ 

I Chromosome 4 | 

20 1142 1156 
I 1143 through 1155 | 

52 8 9 Chromosome 12 
I 

25 * * * 

I Chromosome 4 | 

1243 1272 
I 1244 through 1271 | 

30 52 8 9 Chromosome 12 

I 

★ ★ ★ 

I Chromosome 5 I 

7102 7117 
35 I 7103 through 7116 | 



A double stranded DNA loop of length 5.337 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 
whose identifier is 1142. This Tl control element has 
the DNA sequence 

5 

Seq. Id. ^ 220 Pos it io n - 1 to 318 

ATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGTATGTA 
GATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAATT 
10 CTACACAATTCTATAAATATTATTATCATCATTTTATATGTTAATATTCATTGATCC 
TATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCA 
TCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTA 
GTTAGTAGATGATAGTTGATTTTTATTCCAACA 

15 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1156. This T2 
control element has the DNA sequence 

Seq. Id. = 221 Position 1 to 295 

20 

TTTTAATAAGGCAATAATATTAGGTATGTAGATATACTAGAAGTTCTCCTCCAGGAT 
TTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATC 
ATCATTTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCA 
GCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTA 
25 TATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATT 
CCAACAAGAA 

There are no genes controlled by this T1/T2 loop. 

30 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



A C1/C2 short loop on chromosome 4 whose identifier is 
1143 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

5 

Sag. Id. - 222 Po s iti on ^ 1 to 349 

ATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGTATGTA 
GATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAATT 
10 CTACACAATTCTATAAATATTATTATCATCATTTTATATGTTAATATTCATTGATCC 
TATTACATTATCAAT. . . CTCTAAGTCTCATTGCCTTTGTGCCAAAAAATCTGTTTC 
TAAATTTCTCTTCATTTGTAGACTTAATTATACTGATCGTTGATCTACTATCAGTAA 
GTAAGCCTTTAATAATTGGTTTCTTGTTAAGTTCTTGCACAAGGTGACTGAGGTTAT 
TCAATAGCGG 

15 

This T1-T2 loop also modulates the C1/C2 short loops 
numbered 1144 to 1154 

A C1/C2 short loop on chromosome 4 whose identifier is 
20 1155 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq, Id. = 223 Position 1 to 69 

25 

GAGGAGAACTTCTAGTATATCTACATACCTAATATTATTGCCTTATTAAAAATGGAA 
TCCCAACAATTA 

The expression of genes in this T1/T2 long loop is 
30 controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 12 whose identifier is 
5289 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3*UTR to the gene YLR301W and has 
5 the DNA sequence 

Se q, Id. ^ 224 Position ^ 1 to 324 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGT 
10 ATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCT 
GCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCAT 
TGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTAT 
TTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTACGTAAA 
TACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACAC 

15 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq, Id. = 224 Position === 6 to 64 

20 

ATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGTATGTA 
GA 

The match between the T2 sequence and the C1/C2 sequence 
25 is 

Seq. Id, === 224 Position = 33 to 64 



TTTTAATAAGGCAATAATATTAGGTATGTAGA 

30 



A double stranded DNA loop of length 5.251 kilo-bases on 
chromosome 4 is bounded on the left by a Tl sequence 
whose identifier is 1243. This Tl control element has 
the DNA sequence 

5 

Se q. Id. ^ 225 Position ^ 1 to 36 6 

CGTGTTTTATCTCATGTTGTTCGTTTTGTTATTGAGATATATGTGGGTAATTAGATA 
ATTGTTGGGATTCCATTGTTGATAAAGGCTAT7VATATTAGGTATACAGAATATACTA 
10 GAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAAT 
TCTATAAATATTATTATCATCGTTTTATATGTTAATATTCATTGATCCTATTACATT 
ATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCG 
TCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGA 
TGATAGTTGATTTTTATTCCAACA 

15 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 1272. This T2 
control element has the DNA sequence 

20 Seq , I d. ^ 226 P osition === 1 t o 273 

TGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAAAGGCTAT 
AATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAA 
AAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGT 
25 TAATATTCATTGATC. . . TATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTT 
GATTTTTATTCCAACAGTTATAAGGTTGTTTCATATGTGTTTTATGAA 

There are no genes controlled by this T1/T2 loop. 

30 This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 



A C1/C2 short loop on chromosome 4 whose identifier is 
1244 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

5 

Se q, Id. ^ 227 Pos ition 1 t o 327 

TTTATCTCATGTTGTTCGTTTTGTTATTGAGATATATGTGGGTAATTAGATAATTGT 
TGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATACAGAATATACTAGAAGT 
10 TCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTAT 
AAATATTATTATCAT . . . GTCTCGATGTAGTATACGTATAAATTATTACCTGATACT 
TCATCTCTAAGTCTCATTGCCTTTGTGCCAAAAAATCTGTTTCTAAATTTCTCTTCA 
TTTGTAGACTTAATTATACTGATCGTTGATCTACTATCAGTAAGT 

15 This T1-T2 loop also modulates the C1/C2 short loops 
numbered 1245 to 1270 

A C1/C2 short loop on chromosome 4 whose identifier is 
1271 controls the expression of the genes of one or more 
20 other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq. Id. 228 Position ^ 1 to 309 

25 TGTTGTATCTCAAAATGAGATATGTCAGTATGACAATACGTCATCCTAAACGTTCAT 
AAAACACATATGAAACAACCTTATAACTGTTGGAATAAAAATCAACTATCATCTACT 
AACTAGTATTTACGTTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAA 
TGATGAGAAATAGTC. . . CAACAATGGAATCCCAACAATTATCTAATTACCCACATA 
TATCTCATGGTAGCGCCTGTGCTTCGGTTACTTCTAAGGAAGTCCACACAAATCAAG 

30 ATCCGTTAGACGTTTCAGCTTCCAAAA 



The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 12 whose identifier is 
5 5289 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YLR301W and has 
the DNA sequence 

1 0 Se q. Id. - 229 Position 1 to 325 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGT 
ATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCT 
GCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCAT 
15 TGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTAT 
TTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAA 
ATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACAC 

The match between the Tl sequence and the C1/C2 sequence 
20 is 

Seq. Id. === 229 Position ^ 62 to 317 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAA 
25 TTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCATTGAT 
CCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCT 
CATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATAC 
TAGTTAGTAGATGATAGTTGATTTTTATTCCAACA 

30 The match between the T2 sequence and the C1/C2 sequence 
is 



Seq, Id, = 229 Position = 62 to 317 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGCAA 
TTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCATTGAT 
5 CCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCT 
CATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATAC 
TAGTTAGTAGATGATAGTTGATTTTTATTCCAACA 



10 

A double stranded DNA loop of length 5.296 kilo-bases on 
chromosome 15 is bounded on the left 

by a Tl sequence whose identifier is 7102. This Tl 
control element has the DNA sequence 

15 

Seq. Id, = 230 Position 1 to 365 

CATGATTAATATGACCAATCGGCGTGTGTTTTTGAAAAGTGGGTGAATTTTGAGATA 
ATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGTATGTAGAATGTACTAG 
20 AAGTTCTCCTCAAGGATTTAGGAATCCATGAAAGGGAATCTGCAATTCTACACAATT 
CTATAAATATTATTATCATCATTTTATATGTTAATATTCATTGATCCTATTACATTA 
TCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGT 
CATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGAT 
GATAGTTGATTTTTATTCCAACA 

25 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 7117. This T2 
control element has the DNA sequence 



30 Seq, Id. = 231 Position - 1 to 365 



TGAAAAGTGGGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATA 
ATATTAGGTATGTAGAATGTACTAGAAGTTCTCCTCAAGGATTTAGGAATCCATGAA 
AGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCATTTTATATGTT 
AATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTA 
5 GATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTA 
GTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACAGTTTTATAT 
ACCTCTCTTATTTAGTATAAGAA 

There are no genes controlled by this T1/T2 loop. 

10 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 15 whose identifier is 
15 7103 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq. Id, 232 Position 1 to 357 

20 

AAGAACATTGCTGATGTGATGACAAAACCTCTTCCGATAAAAACATTTAAACTATTA 
ACTAACAAATGGATTCATTAGATCTATTACATTATGGGTGGTATGTTGGAATAAAAA 
TCAACTATCATCTACTAACTAGTATTTACGTTACTAGTATATTATCATATACGGTGT 
TAGAAGATGACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGC 
25 AAGGATTGATAATGTAATAGGATCAATGAATATTAACATATAAAATGATGATAATAA 
TATTTATAGAATTGTGTAGAATTGCAGATTCCCTTTCATGGATTCCTAAATCCTTGA 
GGAGAACTTCTAGTA 

This T1-T2 loop also modulates the C1/C2 short loops 
30 numbered 7104 to 7115 



A C1/C2 short loop on chromosome 15 whose identifier is 
7116 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

5 

Sag, Id. ^ 233 Positi o n 1 to 66 

CCATTCTGTGGAGGTGGTACTGAAGCAGGTTGAGGAGAGACATGATGATGGTTCTCT 
GGAACAGCT 

10 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 12 whose identifier is 
15 5289 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YLR301W and has 
the DNA sequence 

20 Sejq..^ Id, ^ 234 Pos i t i on - 1 to 32 5 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGT 
ATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCT 
GCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTAATATTCAT 
25 TGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTAT 
TTCTCATCATTTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAA 
ATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACAC 

The match between the Tl sequence and the C1/C2 sequence 
30 is 

Seq. Id. = 234 Position = 1 to 66 



GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGT 
ATGTAGAAT 

5 The match between the T2 sequence and the C1/C2 sequence 
is 

Seq. Id. = 234 Position ^ 1 to 66 

10 GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGT 
ATGTAGAAT 



15 Example of a multi-celled geneless connectron - C. 
elegans 

In this example the existence of the three T1-T2 (1142- 
1156, 14840-15042 and 15365-15627) long loops is 
20 controlled by the C1/C2 (16760) short loop. 



167 60 Chromosome 4 
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A double stranded DNA loop of length 15.894 kilo-bases on 
chromosome 1 is bounded on the left by a Tl sequence 
whose identifier is 3101. This Tl control element has 
10 the DNA sequence 

Seq> Id. 2 35 Position =^ 1 to 33 

CAAATCGGCAAATTGCCGGAATTGAACATTTCC 

15 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 3120. This T2 
control element has the DNA sequence 

20 Seq. Id. = 236 Position = 1 to 54 

AAACGATTTTTCCGGCAAATCGGCAAATTGCCGGAATTGTAATTTCCGGCAAAT 

There are no genes controlled by this T1/T2 loop. 

25 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 
30 3103 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 



Seq, Id, = 237 Position = 1 to 55 



TTAAAATTTCCGGCAAATCGGCAAATTGGCAGAAATGAAACTCACGGCAAATCGG 

This T1-T2 loop also modulates the C1/C2 short loops 
5 numbered 3104 to 3118 

A C1/C2 short loop on chromosome 1 whose identifier is 
3119 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
10 DNA sequence 

Seq. Id. = 238 Position ^ 1 to 61 

CCCGCATTTTTTGTAGATCAAACCGTAATGGGACGGCCTGGCAACACGTGATTTTCC 
15 AAAT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

20 A C1/C2 short loop on chromosome 4 whose identifier is 
16760 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene T23E1.2 and has 
the DNA sequence 

25 

Seq. Id. === 239 Po sition === 1 to 124 

GGCAAATTGCCGAAATTGAACATTTCCGGCAAATCGGCAAATTGCCGGAATTGAACA 
TTTCCGGCAAATCGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAAATTG 
30 CCGGAATTGA 



The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. - 239 Position = 30 to 62 

5 

CAAATCGGCAAATTGCCGGAATTGAACATTTCC 

The match between the T2 sequence and the C1/C2 sequence 
is 

10 

Seq. Id. - 239 Position ^ 23 to 53 

TTTCCGGCAAATCGGCAAATTGCCGGAATTG 
15 

A double stranded DNA loop of length 8 6.977 kilo-bases on 
chromosome 3 is bounded on the left by a Tl sequence 
whose identifier is 14840. This Tl control element has 
20 the DNA sequence 

Seq. Id. =^ 240 Position === 1 to 141 

AAAAATTTCCGGCAAGTCGGCAATTTTCCGAAAATGAAAATTTCCGGCAAATCGGCA 
25 AATTGCCGGAATTGAAAATTCCTGGCAAATCAGCAAATTTGCGGCAAATCGGCAATT 
TGCCGAAAATGAAAATTTCCGGCAAAT 

This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 15042. This T2 
30 control element has the DNA sequence 

Sea. Id. 241 Position = 1 to 98 



4" 



I 



CAAATCGGTAGGTAAATTGGCCAAACTTGAAAATTTCCGGCAAATCGGCAAATTCCG 
CGAACTGAACATTTCCGGCAAATCGGCAAATTGCTCGAACT 



5 There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

10 A C1/C2 short loop on chromosome 3 whose identifier is 
14841 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop *has the 
DNA sequence 

15 Seq. Id, = 242 Position ^ 1 to 141 

AAAAATTTCCGGCAAGTCGGCAATTTTCCGAAAATGAAAATTTCCGGCAAATCGGCA 
AATTGCCGGAATTGAAAATTCCTGGCAAATCAGCAAATTTGCGGCAAATCGGCAATT 
TGCCGAAAATGAAAATTTCCGGCAAAT 

20 

This T1-T2 loop also modulates the C1/C2 short loops 
numbered 14842 to 15040 

A C1/C2 short loop on chromosome 3 whose identifier is 
25 15041 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq. Id. =^ 243 Position = 1 to 55 

30 

CGGCAATTGCCGTTCGGCAATTTGCCAATTTGCCGGAAATTTTCAATTCCGGCAA 
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The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 



A C1/C2 short loop on chromosome 4 whose identifier is 
5 16760 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene T23E1.2 and has 
the DNA sequence 

1 0 Seq. Id. = 244 Position = 1 to 124 

GGCAAATTGCCGAAATTGAACATTTCCGGCAAATCGGCAAATTGCCGGAATTGAACA 
TTTCCGGCAAATCGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAAATTG 
CCGGAATTGA 

15 

The match between the Tl sequence and the C1/C2 sequence 
is 

Seq. Id. - 244 Position ^ 22 to 55 

20 

ATTTCCGGCAAATCGGCAAATTGCCGGAATTGAA 

The match between the T2 sequence and the C1/C2 sequence 
is 

25 

Seq. Id. 244 Position = 17 to 45 
TGAACATTTCCGGCAAATCGGCAAATTGC 



30 



A double stranded DNA loop of length 98.488 kilo-bases on 
chromosome 3 is bounded on the left by a Tl sequence 
whose identifier is 15365. This Tl control element has 
the DNA sequence 

5 

Seq. Id. 245 Position ^ 1 to 336 

AAAATTTCCGGCAAATCGGCAATTTGCCAAAAATTGAAATTTCCGGCAAATCGGCAA 
TTTGTCAAAAATGAAAATTTCCGGCAAATCGGCAAATTGCCGAAAATGAAAATTTCC 
10 GGCAAATCGGCAAACTTCCGGAACTGAAAATTTCCGGCAAATCGGCAATTTGCCATA 
AATGAACATTTCCGG. . . GGCGAAAATTAAAATTTCCGCCATATCGGCAATTTGCCA 
AAAAATTAAAATTTCCGGCAAATCGGCAAATTGCCGGAATTCAAAATTTCCGGCAAA 
CCGGCAAATTGCCGGAACTCAAAATTCCCGGCAAATCAGCAAATTGCCGGAATT 

15 This double stranded DNA loop is bounded on the right by 
a T2 control element whose identifier is 15627. This T2 
control element has the DNA sequence 

Seq. Id, ^ 246 Position === 1 to 68 

20 

TGGCAAACCGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAATTTGCCGG 
AATTGAAATTT 

There are no genes controlled by this T1/T2 loop. 

25 

This long T1/T2 double stranded DNA loop modulates the 
expression of the following C1/C2 short loops 

A C1/C2 short loop on chromosome 3 whose identifier is 
30 15366 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 



TGCCGATTTGCCGGAAATTTTCATTTTCGGCAATTTGCCGATTTGCCGGAAATTTTC 
ATT 

This T1-T2 loop also modulates the C1/C2 short loops 
numbered 15366 to 15624 

A C1/C2 short loop on chromosome 3 whose identifier is 
15625 controls the expression of the genes of one or more 
other T1/T2 long loops. This C1/C2 short loop has the 
DNA sequence 

Seq. Id, = 248 Position = 1 to 54 

TCAAGCAAATTGTCAAATTCGCGGAACTAAACATTTCCGGCAAATCGGCAAATT 

The expression of genes in this T1/T2 long loop is 
controlled by the following C1/C2 short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 
16760 controls the expression of the genes in this T1/T2 
long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene T23E1.2 and has 
the DNA sequence 

Seq. Id. = 249 Position = 1 to 124 

GGCAAATTGCCGAAATTGAACATTTCCGGCAAATCGGCAAATTGCCGGAATTGAACA 
TTTCCGGCAAATCGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAAATTG 
CCGGAATTGA 



The match between the Tl sequence and the C1/C2 sequence 
is 

5 Sag. I d. - 24 9 Po sition = 22 to 52 
ATTTCCGGCAAATCGGCAAATTGCCGGAATT 

The match between the T2 sequence and the C1/C2 sequence 
10 is 

Seq. Id. ^ 249 Position ^ 35 to 75 
CGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAA 



Claims 

What is claimed in: 

5 4-. A m e thod of identifSing DNA s e quenc e s that control the e xpression of 

different coll e ctions of g e n e s in a g e nom e comprising, d e t e cting s e l e ct e d DMA 
sequences adjacent to some genes excluding axons and introns. 

^. A method of identifying DNA sequence s that control th e expre s sion of 

10 different coll ections of genes com pris4jig, deteef mg, by co m p u ter, one or more pair s 
of non adjacent DNA sequenc e s to which are bound to two RNA sequences. 

3-^ A m e thod of identifying DNA s e quences that control the e xpression of 

differ e nt coll e ctions of gen e s in a g e nom e comprising d e t e cting changes in 
15 connectron b e havior in th e g e nom e , 

4~. A method of modilying the expression of diff e r e nt g e n e collections in a 

genome, comprising detecting changes in conn e ctron beha\ ior as a r e sult of an 
exog e nous stimulus, 

20 

A method of detecti n g wh ere and when new genes are being int e grated into a 

ho s t genome comprising detecting the connectrons in said host g e nome. 

A method of detecting the expression effect of different gene collections in a 

25 giv e n body comprising det e cting the back and forth flow^ of connectrons b e tw ee n th e 
chromosomes thereof 

A method of mod i fying a given body comprising modifying the connectron 

organization th e r e in, 

30 
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A m e thod of d e t e cting connoctron control and target sequ e nc e s in a given 

genome comprising: 



d e t e rmining the base composition of aaid genom e , 
5 d e t e rmining on e or mor e sit e s o f contr o l G o qu c nc e organization, a fKt/or 

determin i ng on e or mor e s ite s of target application^ 

A method of determining the response of a cell in any tissue to change s in the 

c e ll's e nvironment and/or g e n e tic composition comprising providing a complet e 
10 genomic DNA sequ e nce for th e organism and detcnnining the e ffect of changes in 
connectrons due to application of a given exogenous stimulus to the gnome. 

In prokaryotes, arch o a, single coll e d eukaryote s and multi celled e ukar>otes, 

the tetradic relationship Tl-Cl and T2-C2 where Tl and T2 are DNA s equences 20 
15 or more bases in length, where the CI sequence is adjacent to the C2 sequ e nce, wh e re 
th e T\ and 12 sequences are on th e sam e chromosom e , and wh e r e the C1/C2 
sequences are on the same chromosom e as Tl and T2 or whore the C1/C2 sequences 
are on a chromosome differ e nt from Tl and 12, wh e rein: 

20 CI sequenc e — an y p o sit ive or negative stra n d D NA s equence of 20 bas e s or 

more, the C2 sequenc e must occur in the same chromosom e as the CI 
s e qu e nce, 

C2 sequ e nc e — any positiv e or negative strand DNA sequenc e of 20 bases or 
^5 mor e , th e CI sequence must occur in the sam e chromosom o as th e C2 

sequ e nce, 



C1/C2 — any po s itiv e or negative strand DNA sequ e nc e of 10 or mor e bases 
such that th e C 1 s e qu e nc e is adjac e nt to the C2 s e qu e nc O ji 

30 
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T! a e quence — any positiv e or negativ e strand PNA s e quence of 20 basog or 
mor e that is on the sam e chromo s ome as the T2 sequ e nc e , th e T1 and T2 
s e quences must b e b e tw e en about I kb and I05kb apart, and 







more that is on the same chromosome as th e Tl sequ e nc e , th e T2 or Tl 
s e qu e nces must b e betw ee n about ikb and 105kb apart. 

In prokaryot e s, archea, single cell e d e ukaryot e s and niuiti c e ll e d e ukar>otes. 

10 th e connectron relation s hip that permits many different C1/C2 short loops to control 
the existence of a Tl T2 long loop and wherein said C1/C2 s hort lops can be on the 
same chromosome or on diff e rent chromosom e s from the Tl T2 long loop, wherein: 



CI sequence — any positive or negative strand PNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosom e as the CI 
s e qu e nc e , 

C2 sequence — any positive or negative strand DNA s e qu e nc e of 20 bas es or 
more, the CI se quence must occur in the sam e chromosome as th e C2 



C1/C2 — any positive or negative strand DNA sequ e nce of 5 10 or more bases 
s uch that th e CI s e quence is adjacent to the C2 s e quence, 

Tl sequenc e — any positiv e or n e gativ e strand DNA s e qu e nce of 20 bas e s or 
mor e that is on th e same chromosome as th e T2 sequ e nc e , th e Tl and T2 
sequ e nc e s must be betw e en about Ikb and 105kb apart, and 

T2 s e qu e nc e — any positive or n e gativ e strand DNA s e qu e nc e of 20 bases or 



20 



sequence. 
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more that is on th e same chromosom e as th e Tl s e qu e nce, th e T2 or Tl 




4-5^ In prokaryot e s, arch e a, singl e c o llod oukaryoteG and multi c e ll e d e ukar>^otes, 

th e conn e ctron r e lation s hip that permits one C1/C2 short loop to control th e e xistenc e 
of many 11 T2 long loop s , th e C1/C2 short loop can b e on the same chromosomo or 
on different chromo s omes from the Tl T2 long loops, wh o r o in: 

CI sequence — any positive or n e gative strand DNA sequence of 20 ba se s or 
mor e , th e C2 s e qu e nc e must occur in th e sam e chromosome as th e CI 
sequence. 



10 C2 sequence — any positive or negative strand DNA s equenc e of 20 bases or 

more, the CI sequence must occur in the same chromosome as the C2 
sequence, 



CI/C2 — any po s itive or negative strand DNA s equence of 1 0 or more base s 
15 such that the CI sequence Is adjacent to the C2 sequence. 



20 



Tl sequence — any positive or negative strand DNA sequence of 20 ba s es or 
mor e that is on the same chromosome as the T2 sequ e nc e , th e Tl and T2 
s e quenc e s must be between about Ikb and 105kb apart, and 

T2 s e quence — any positive or negative strand DNA sequenc e of 20 bases or 
mor e that is on th e sam e chromosom e as th e Tl sequence, the T2 or Tl 
s e quences must be b e Uve e n about Ikb and 105kb apart. 



25 i^, Th e conn e ctron relationships betw e en prokaiyotes and th e ir plasmids wh e r e in 

said conn e ctrons impl e ment a control m e chanism b e tween th e two g e nomes that 
mak e s it possibl e from th o rn to form a symbiotic r e lationship, and in th e cas e of D. 
radiodurans the relationship is not symmetric, and th e D. radiodurans genom e sends 
C1/C2 short loops to the MPl plasmid, wh e r e in: 

30 
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CI sequence — any positiv e or negativ e strand DNA s e qu e nc e of 20 bases or 
mor e , th e C2 sequ e nc e must occur i n th e same chromosom e as th e CI 
s e qu e nc e , 

C2 s e qu e nc e — any positiv e or n e gativ e st i^ and DNA se qu e nc e of 20 ba s es o r 
mor e , th e CI — s e quenc e mu s t occur i n th e same chromosome as th e C2 
s e qu e nc e , 

C1/C2 — any positiv e or negative strand DNA sequ e nc e of 40 or mor e bases 
such that the CI sequence is adjacent to the C2 sequ e nc e , 

Tl sequence — any po s itive or negative strand DNA s e qu e nce of 20 bases or 
more that is on the same chromosome as the T2 sequenc e , the Tl and T2 
s e quen c es mu s t be bet>veen about 1 kb and I05kb apart, and 

T2 s e qu e nc e — any positive or negative strand DNA se quenc e of 20 bas e s or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
s e qu e nces must be b e tween about I kb and lOSkb apart. 

20 44t The conne ctron relatio n ships that e xi s t in plant and hi gher anim akr 

4-#^ In prokar> ' Ot e G, arch e a, single c e lled eukaryot e s and multi c e ll e d eukaryotes, 

the conn e ctron relationship that p e rmits one C1/C2 short loop to control the exist en ce 
of on e or mor e Tl T2 long loops without b e ing subj e ct to any e xpr e ssion controls 
25 other than thos e of the g e ne to which th e C1/C2 is 3TJTR, wher e in: 

CI sequ e nce — any positiv e or n e gativ e s trand DNA sequ e nce of 20 bas e s or 
mor e , th e C2 sequ e nc e must occur in th e same chromosom e as th e CI 
s e qu e nce, 

30 
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C2 s e qu e nc e — any positiv e or n e gative strand DMA soquoncc of 20 basos or 
mor e , the CI sequence must occiu' in th e same chromosom e as the C2 
sequ e nce. 



such that the CI sequenc e is adjacent to the C2 sequenc e , 



Tl sequenc e — any positive or negative strand DNA sequence of 20 bases or 
mor e that is on the sam e chromosome as th e T2 s e quenc e , the Tl and T2 
10 s e qu e nces must be between about I kb and 105kb apart, 

T2 sequ e nce — any positive or n e gative strand DNA s e qu e nce of 20 bas e s or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
s equences must be between about I kb and 105kb apart, and 

15 

3'UTR — untranslated 3' end of an mRNA is b e yond th e e nd of the last e xon, a 
s top codon in th e mRNA causes the ribosome to sto p the translation of mRNA 
into protein. 



th e connectron r e lationship that permits one C1/C2 short loop to control the existence 
of on e or mor e Tl T2 long loops such that this C1/C2 short loop is itself subj e ct to 
e xpression control by anoth e r Tl T2 long loop which surround s it, wherein: 

25 CI sequence — any positiv e or n e gativ e strand DNA sequ e nc e of 20 bas e s or 

mor e , the C2 s e qu e nce must occur in th e sam e chromosom e as th e CI 
sequenc e , 

C2 s e quenc e — any positive or n e gativ e strand DNA s e qu e nc e of 20 bas e s or 
30 more, the CI s e quence must occur in the sam e chromosome a s the C2 
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C1/C2 — an> positiv e or n e gativ e s trand DNA aequenco of 4 0 or mor e bases 
s uch that the CI sequence is adjacent to th e C2 sequ e nce, 

Tl sequ e nc e — any positive or n e gative strand DNA s e qu e nce of 20 bas e s or 
5 mor e that is on th e sam e chr om o so me as th e T2 s e qu eHee , th e Tl and T2 

sequ e nce s must be betv > een about Ikb and 105kb apart, and 

T2 s e quence — any positiv e or negativ e strand DNA sequence of 20 bas es or 
mor e that is on th e sam e chromosome as th e Tl sequenc e , the T2 or Tl 
10 s equence s must be b e tween about Ikb and I05kb apart, 

In prokaryotes, archea, single celled cukaryot o s and multi celled e ukatyot e s, 

th e conn e ctron relationship that pennits one C1/C2 short loop to control the existence 
of the Tl T2 long loop that surrounds it. v^ herein: 



15 



20 



CI s e qu e nce — any positive or n e gativ e strand DNA s e qu e nce of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequ e nce^ 



more, th e CI sequenc e must occur in the same chromosome as the C2 
s e qu e nc e , 

C1/C2 — any positiv e or negative strand DNA sequ e nc e of 50 or mor e bas e s 
25 such that th e CI sequ e nc e is adjac e nt to th e C2 s e qu e nc e , 

Tl s e quence — any positiv e or negativ e strand DNA s e qu e nc e of 20bases or 
more that is on the same chromosome as the T2 s equenc e , the Tl and T2 
sequences must be b e tween about I kb and 105kb apart, and 

30 



I 



12 soqucncQ — any positiv e or n e gativ e strand s e qu e nc e of 20 bas e s or 

mor e that is on the s am e chromosome as th e Tl sequenc e , th e T2 or Tl 
s e quences must b e b e tw ee n about Ikb and 105kb apart. 

loop, wh e rein: 

Tl sequence is any po s itiv e or n e gativ e strand DNA sequenc e of 20 bases or 
mor e that is on th e same chromosom e as th e T2 se qu e nc e , and 

10 

T2 sequence — any positive or negative strand DNA sequence of 20 bases or 
more that is on the s ame chromosome as the Tl sequence, and the T2 or Tl 
sequence s must b e b e tween about Ikb and 105kb apart. 

15 4-^; Th e gen e le s s connectron relationship wher e one C1/C2 short loop controls the 

e?cistence of many g c nclos s Tl 12 long loops, wherein: 

CI s e quence — any positive or n e gative strand DNA sequenc e of 20 bas e s or 
more, the C2 sequence must occur in th e sam e chromosom e as the CI 
20 sequence, 

C2 s e qu e nce — any positive or negotiv Q strand DNA s e quence of 20 bas e s or 
more, the CI sequence must occur in the sam e chromosom e as the C2 
s e qu e nc e , 

25 

C1/C2 — any positive or negativ e strand DNA s equ e nce of ^10 or mor e bas e s 
such that the CI sequenc e is adjac e nt to th e C2 s e qu e nc e , 



30 



Tl sequence — an>^ positiv e or n e gative strand DNA sequ e nc e of 20 bas e s or 
more that is on th e same chromosome a s the T2 sequ e nc e , th e Tl and T2 
sequences must be b etw e en about I kb an d 105kb apart, a n d 



T2 s e qu e nc e — any positiv e or n e gative strand DNA s e qu e nc e of 20 bas e s or 
mor e that is on the sam e chromosome a s the Tl sequ e nc e , the T2 or Tl 
s e qu e nc e s must be b e tw ee n about i kb and 105kb apart. 



Abstract 

An algorithm has been developed to identify four DNA 
5 sequences of 20 bases or more that form a structure 
called a connectron. Two sequences CI and C2 are 
adjacent to each other. These sequences are expressed as 
RNA in the 3'UTR of some genes in many prokaryotic, 
archea and eukaryotic genomes. The other half of a 

10 connectron is two DNA sequences Tl and T2 that are on the 
same chromosome and range in distance from each other by 
about Ikb to 105kb. The 01 sequence is identical to the 
Tl sequence and the C2 sequence is identical to the T2 
sequence. C1/C2 and T1-T2 can be on different 

15 chromosomes. The C1/C2 RNA sequence of the gene 

transcript finds the two double-stranded DNA sequences Tl 
and T2 . The single-stranded RNA and double-stranded DNA 
then form a triple-stranded Hoogsteen helix of the 
RNA/DNA/DNA variety. Because the CI sequence is adjacent 

20 to the C2 sequence, the Tl sequence is made spatially 
adjacent to the T2 sequence in a compact X-shaped 
structure. Chromatin particles form as compact 30nm 
assemblies in the DNA between Tl and T2 thus eliminating 
the intervening genes from promotion and expression. 

25 Connectrons remove sets of genes from expression and thus 
modulate the behavior of many types of cells. 
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